In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
翻译:在许多公共卫生环境中,病人必须坚持保健方案,例如服药和定期健康检查。不幸的是,受益者可能逐渐脱离这种有害健康的方案。一个实施免费自动呼救方案向孕妇传播预防性保健信息的组织观察到了一个逐渐脱离的具体例子。许多妇女在注册后几个月内停止接听电话。为了避免这种脱离,必须及时提供干预。这种干预往往费用昂贵,只能提供给一小部分受益者。我们把这种设想作为无休止的多臂强盗(RMAB)问题,假定每个受益者根据干预情况从一个州向另一个州过渡。此外,由于过渡概率不明,我们提议了一个基于Q-Learn机制的Whittle指数,并表明它与最佳解决办法一致。我们的方法改进了现有RMAB的学习方法,即从文献和产妇保健数据中得出多项基准。