个人化保健干预措施数据共享强化学习 (Data-pooling Reinforcement Learning for Personalized Healthcare Intervention)

Motivated by the emerging needs of personalized preventative intervention in many healthcare applications, we consider a multi-stage, dynamic decision-making problem in the online setting with unknown model parameters. To deal with the pervasive issue of small sample size in personalized planning, we develop a novel data-pooling reinforcement learning (RL) algorithm based on a general perturbed value iteration framework. Our algorithm adaptively pools historical data, with three main innovations: (i) the weight of pooling ties directly to the performance of decision (measured by regret) as opposed to estimation accuracy in conventional methods; (ii) no parametric assumptions are needed between historical and current data; and (iii) requiring data-sharing only via aggregate statistics, as opposed to patient-level data. Our data-pooling algorithm framework applies to a variety of popular RL algorithms, and we establish a theoretical performance guarantee showing that our pooling version achieves a regret bound strictly smaller than that of the no-pooling counterpart. We substantiate the theoretical development with empirically better performance of our algorithm via a case study in the context of post-discharge intervention to prevent unplanned readmissions, generating practical insights for healthcare management. In particular, our algorithm alleviates privacy concerns about sharing health data, which (i) opens the door for individual organizations to levering public datasets or published studies to better manage their own patients; and (ii) provides the basis for public policy makers to encourage organizations to share aggregate data to improve population health outcomes for the broader community.

翻译：基于许多医疗保健应用中个人化预防性干预的新出现的需要,我们认为,在具有未知模型参数的在线环境中,存在着一个多阶段、动态的决策问题。为了处理个人化规划中抽样规模小这一普遍问题,我们开发了一种新的数据汇集强化学习(RL)算法,其依据是一般的偏转价值迭代法框架。我们的算法将历史数据集中在一起,并有三个主要创新:(一) 将直接联系与决策的执行(以遗憾衡量)而不是对常规方法的准确性进行估计的权衡权重;(二) 历史数据和当前数据之间不需要参数假设;以及(三) 仅要求通过综合统计而不是病人一级的数据共享数据。我们的数据汇集算法框架适用于各种广受欢迎的RL算法,我们建立了一个理论性绩效保证,表明我们的合并版的遗憾程度远远小于无集合对应方。我们通过案例研究,从经验上证实我们的算法的改进了我们的算法绩效,从而防止计划外公众再访问,而不是通过病人一级数据共享,为医疗保健管理提供实际的保密性分析基础。具体地说,我们的数据管理个人-为个人-分享数据,以便管理自己的保密性研究提供个人-管理数据,以改进自身的保密性数据基础。具体地管理,以便管理个人-了解。具体地管理数据,以便管理,以便管理个人-管理个人-管理其数据基础,为健康数据,为健康管理,为健康管理数据基础。