Policy makers typically face the problem of wanting to estimate the long-term effects of novel treatments, while only having historical data of older treatment options. We assume access to a long-term dataset where only past treatments were administered and a short-term dataset where novel treatments have been administered. We propose a surrogate based approach where we assume that the long-term effect is channeled through a multitude of available short-term proxies. Our work combines three major recent techniques in the causal machine learning literature: surrogate indices, dynamic treatment effect estimation and double machine learning, in a unified pipeline. We show that our method is consistent and provides root-n asymptotically normal estimates under a Markovian assumption on the data and the observational policy. We use a data-set from a major corporation that includes customer investments over a three year period to create a semi-synthetic data distribution where the major qualitative properties of the real dataset are preserved. We evaluate the performance of our method and discuss practical challenges of deploying our formal methodology and how to address them.
翻译:决策者通常面临一个问题,即想要估计新疗法的长期影响,而只是拥有关于老疗法选择的历史数据。我们假定可以使用长期数据集,只有过去的治疗得到管理,而短期数据集得到新疗法得到管理。我们提议以代位为基础的方法,我们假定长期影响是通过多种现有的短期代理人传播的。我们的工作结合了因果机器学习文献中最近三项主要技术:代用指数、动态治疗效果估计和在统一的管道中双机学习。我们表明,我们的方法是一致的,根据Markovian关于数据和观察政策的假设,我们提供了原始的正常估计数。我们使用一个大型公司的数据集,其中包括三年的客户投资,以建立半合成数据传播,从而保留真实数据集的主要质量特性。我们评估了我们的方法的绩效,并讨论了部署我们正式方法的实际挑战以及如何解决这些问题。