矛盾的动态治疗制度:强化学习方法 (Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach)

A main research goal in various studies is to use an observational data set and provide a new set of counterfactual guidelines that can yield causal improvements. Dynamic Treatment Regimes (DTRs) are widely studied to formalize this process. However, available methods in finding optimal DTRs often rely on assumptions that are violated in real-world applications (e.g., medical decision-making or public policy), especially when (a) the existence of unobserved confounders cannot be ignored, and (b) the unobserved confounders are time-varying (e.g., affected by previous actions). When such assumptions are violated, one often faces ambiguity regarding the underlying causal model. This ambiguity is inevitable, since the dynamics of unobserved confounders and their causal impact on the observed part of the data cannot be understood from the observed data. Motivated by a case study of finding superior treatment regimes for patients who underwent transplantation in our partner hospital and faced a medical condition known as New Onset Diabetes After Transplantation (NODAT), we extend DTRs to a new class termed Ambiguous Dynamic Treatment Regimes (ADTRs), in which the causal impact of treatment regimes is evaluated based on a "cloud" of causal models. We then connect ADTRs to Ambiguous Partially Observable Mark Decision Processes (APOMDPs) and develop Reinforcement Learning methods, which enable using the observed data to efficiently learn an optimal treatment regime. We establish theoretical results for these learning methods, including (weak) consistency and asymptotic normality. We further evaluate the performance of these learning methods both in our case study and in simulation experiments.

翻译：各种研究的一项主要研究目标是使用观察数据集,并提供一套新的反事实准则,以产生因果关系的改善。动态治疗制度(DTRs)被广泛研究,以便正式确定这一进程。然而,找到最佳DTRs的现有方法往往依赖于现实应用(如医疗决策或公共政策)中违反的假设,特别是当以下情况下:(a) 无法忽视存在未观察到的疑惑者,以及(b) 未观察到的混淆者具有时间变化性(例如,受先前行动影响的),当这种假设被违反时,人们往往在根本的因果关系模型上面临模糊不清。这种模糊性是不可避免的,因为未观察到的纠结者的动态及其对观察到的数据部分的因果关系影响无法从观察到的数据中理解。受一项案例研究的推动,即为在我们的伙伴医院接受移植的病人找到更好的治疗制度,并面临被称为“我们最佳糖尿病后(NIDATT),我们将DTRs推广到一个新的名为“不稳定的动态治疗制度”(ADTRs)的新阶级, 包括我们当时的理性学习方法(ADTRs), 将这些调查中的因果关系分析方法与这些结果法用于“我们现在的因果关系分析” 的理论的理论中,这些研究,这些理论的理论的理论的因果关系分析中,这些理论是用来评估。