We consider off-policy evaluation (OPE) in continuous treatment settings, such as personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new treatment decision rule using historical data generated by a different decision rule. Most existing works on OPE focus on discrete treatment settings. To handle continuous treatments, we develop a novel estimation method for OPE using deep jump learning. The key ingredient of our method lies in adaptively discretizing the treatment space using deep discretization, by leveraging deep learning and multi-scale change point detection. This allows us to apply existing OPE methods in discrete treatments to handle continuous treatments. Our method is further justified by theoretical results, simulations, and a real application to Warfarin Dosing.
翻译:我们考虑在连续治疗环境中进行非政策性评估,例如个性化剂量调查。在连续治疗环境中,我们考虑的是非政策性评估(OPE),例如在个性化剂量调查中。在OPE中,我们的目标是利用不同决定规则产生的历史数据来估计新的治疗决定规则下的平均结果。关于OPE的现有工作大多侧重于离散治疗环境。在处理连续治疗时,我们利用深度跳跃学习,为OPE开发了一种新的估计方法。我们方法的关键成分在于利用深层离散,利用深层学习和多尺度改变点探测,使治疗空间适应性地离散。这使我们能够在离散治疗中应用现有的OPE方法处理连续治疗。我们的方法还有理论结果、模拟和对Warfarin Dosing的真正应用。