个性化行为健康干预的策略优化 (Policy Optimization for Personalized Interventions in Behavioral Health)

Problem definition: Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes, through education, motivation, reminders, and outreach. We study the problem of optimizing personalized interventions for patients to maximize some long-term outcome, in a setting where interventions are costly and capacity-constrained. Methodology/results: This paper provides a model-free approach to solving this problem. We find that generic model-free approaches from the reinforcement learning literature are too data intensive for healthcare applications, while simpler bandit approaches make progress at the expense of ignoring long-term patient dynamics. We present a new algorithm we dub DecompPI that approximates one step of policy iteration. Implementing DecompPI simply consists of a prediction task from offline data, alleviating the need for online experimentation. Theoretically, we show that under a natural set of structural assumptions on patient dynamics, DecompPI surprisingly recovers at least 1/2 of the improvement possible between a naive baseline policy and the optimal policy. At the same time, DecompPI is both robust to estimation errors and interpretable. Through an empirical case study on a mobile health platform for improving treatment adherence for tuberculosis, we find that DecompPI can provide the same efficacy as the status quo with approximately half the capacity of interventions. Managerial implications: DecompPI is general and is easily implementable for organizations aiming to improve long-term behavior through targeted interventions. Our case study suggests that the platform's costs of deploying interventions can potentially be cut by 50%, which facilitates the ability to scale up the system in a cost-efficient fashion.

翻译：问题定义：通过数字平台提供的行为健康干预可以通过教育、激励、提醒和触达等方式显着改善健康结果。我们研究了在干预成本高昂和能力有限的情况下优化患者个性化干预以最大化某些长期结果的问题。方法/结果：本文提供了一种无模型方法来解决这个问题。我们发现，强化学习文献中的通用无模型方法对于医疗保健应用来说需要太多的数据，而更简单的赌博方法在忽略长期患者动态的情况下取得进展。我们提出了一种新的算法，将其称为DecompPI，可近似一个策略迭代步骤。实施DecompPI仅需要从离线数据进行预测任务，减轻了在线实验的需求。从理论上讲，我们证明，在患者动态的一些自然结构假设下，DecompPI超出了朴素基线策略和最优策略之间可能存在的改进中的至少1/2。与此同时，DecompPI既具有估计误差的鲁棒性，又具有可解释性。通过针对结核病患者治疗依从性的移动健康平台的实证案例研究，我们发现DecompPI可以通过干预容量的约一半提供与现状相同的疗效。管理意义：DecompPI是通用的，易于实施，适用于旨在通过针对性干预改善长期行为的组织。我们的案例研究表明，该平台部署干预的成本有望降低50％，这有助于以成本效益的方式扩大系统的规模。