Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. However, the practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. Many mHealth applications involve decision-making with large numbers of intervention options and under an infinite time horizon setting where the number of decision stages diverges to infinity. In addition, temporary medication shortages may cause optimal treatments to be unavailable, while it is unclear what alternatives can be used. To address these challenges, we propose a Proximal Temporal consistency Learning (pT-Learning) framework to estimate an optimal regime that is adaptively adjusted between deterministic and stochastic sparse policy models. The resulting minimax estimator avoids the double sampling issue in the existing algorithms. It can be further simplified and can easily incorporate off-policy data without mismatched distribution corrections. We study theoretical properties of the sparse policy and establish finite-sample bounds on the excess risk and performance error. The proposed method is implemented by our proximalDTR package and is evaluated through extensive simulation studies and the OhioT1DM mHealth dataset.
翻译:移动保健(保健)技术的最近进展为监测个人健康状况和及时提供个性化干预提供了有效途径,但是,实际使用保健技术对学习最佳动态治疗制度的现有方法提出了独特的挑战。许多保健应用涉及决策,有许多干预备选办法,在无限的时间范围内,决策阶段的数目不同至无限;此外,临时药品短缺可能导致无法获得最佳治疗,尽管还不清楚可以使用何种替代方法。为应对这些挑战,我们提议了一个极佳的时空一致性学习(pT-learn)框架,以估计一种在确定性和随机性稀薄政策模型之间适应调整的最佳制度。由此产生的微量估计器避免了现有算法中的双重抽样问题。它可以进一步简化,很容易地纳入离政策数据,而不会出现不匹配的分配纠正。我们研究稀疏政策的理论性质,并针对超重风险和性差错定的界限。拟议方法由我们的proximDTR软件包实施,并通过广泛的模拟研究和SOM1数据模型评估。