Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. However, the practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. Many mHealth applications involve decision-making with large numbers of intervention options and under an infinite time horizon setting where the number of decision stages diverges to infinity. In addition, temporary medication shortages may cause optimal treatments to be unavailable, while it is unclear what alternatives can be used. To address these challenges, we propose a Proximal Temporal consistency Learning (pT-Learning) framework to estimate an optimal regime that is adaptively adjusted between deterministic and stochastic sparse policy models. The resulting minimax estimator avoids the double sampling issue in the existing algorithms. It can be further simplified and can easily incorporate off-policy data without mismatched distribution corrections. We study theoretical properties of the sparse policy and establish finite-sample bounds on the excess risk and performance error. The proposed method is provided in our proximalDTR package and is evaluated through extensive simulation studies and the OhioT1DM mHealth dataset.
翻译:移动保健(保健)技术的最近进展为监测个人健康状况和及时提供个性化干预提供了有效途径,但是,实际使用保健技术对学习最佳动态治疗制度的现有方法提出了独特的挑战。许多保健应用涉及决策,有许多干预备选办法,在无限的时间范围内,决策阶段的数目与无限宽度不尽相同。此外,临时药品短缺可能导致无法获得最佳治疗,尽管还不清楚可以使用何种替代方法。为应对这些挑战,我们提议了一个极佳的时空一致性学习(pT-Learning)框架,以估计一种在确定性和随机性稀有政策模型之间适应调整的最佳制度。由此产生的微量估计值避免了现有算法中的双重抽样问题。它可以进一步简化,很容易地纳入离政策数据,而不会出现不匹配的分配纠正。我们研究了稀疏政策的理论性质,建立了关于超重风险和性差错的有限缩括框。我们提出的方法是在我们的poximalDTR软件包中提供,并通过广泛的模拟研究和俄亥磁系统数据来评估。