Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
翻译:动态治疗制度根据基线信息和时间变化的共变情况,对病人按时间顺序进行个性化治疗。在移动保健应用中,这些共变通常在较长的时期内在不同频率收集。在本文件中,我们建议采用深光谱Q-学习算法,将主要成分分析(PCA)与深度Q-学习结合起来,处理混合频率数据。理论上,我们证明估计最佳政策下的平均回报率与最佳政策下的平均回报率一致,并确立其趋同率。我们的建议的有用性通过模拟和糖尿病数据集的应用得到进一步说明。