Evidence-based or data-driven dynamic treatment regimes are essential for personalized medicine, which can benefit from offline reinforcement learning (RL). Although massive healthcare data are available across medical institutions, they are prohibited from sharing due to privacy constraints. Besides, heterogeneity exists in different sites. As a result, federated offline RL algorithms are necessary and promising to deal with the problems. In this paper, we propose a multi-site Markov decision process model which allows both homogeneous and heterogeneous effects across sites. The proposed model makes the analysis of the site-level features possible. We design the first federated policy optimization algorithm for offline RL with sample complexity. The proposed algorithm is communication-efficient and privacy-preserving, which requires only a single round of communication interaction by exchanging summary statistics. We give a theoretical guarantee for the proposed algorithm without the assumption of sufficient action coverage, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed. Extensive simulations demonstrate the effectiveness of the proposed algorithm. The method is applied to a sepsis data set in multiple sites to illustrate its use in clinical settings.
翻译:个人化医学必须具备基于证据或数据驱动的动态治疗制度,这种制度可以受益于脱线强化学习(RL) 。尽管各医疗机构都有大量的保健数据,但由于隐私限制,禁止分享这些数据。此外,在不同地点存在差异性。因此,离线RL联合算法对于解决问题是必要的,而且很有希望。在本文件中,我们提出了一个多站点的Markov决策程序模型,允许不同地点的同质和异质效应。拟议的模型使得有可能对站点一级的特点进行分析。我们为具有抽样复杂性的脱线RL设计了第一个联合政策优化算法。提议的算法是通信效率和隐私保存,只需要通过交换摘要统计数据进行单轮通信互动。我们从理论上保证拟议的算法,而没有假定足够的行动覆盖面,因为所学政策的亚优度与比率相似,因为数据没有分布。广泛的模拟表明拟议的算法的有效性。该方法适用于多个地点的Sepsis数据集,以说明临床环境中使用的方法。