We consider the problem of learning an optimal prescriptive tree (i.e., a personalized treatment assignment policy in the form of a binary tree) of moderate depth, from observational data. This problem arises in numerous socially important domains such as public health and personalized medicine, where interpretable and data-driven interventions are sought based on data gathered in deployment, through passive collection of data, rather than from randomized trials. We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology. We show that under mild conditions our method is asymptotically exact in the sense that it converges to an optimal out-of-sample treatment assignment policy as the number of historical data samples tends to infinity. This sets us apart from existing literature on the topic which either requires data to be randomized or imposes stringent assumptions on the trees. Based on extensive computational experiments on both synthetic and real data, we demonstrate that our asymptotic guarantees translate to significant out-of-sample performance improvements even in finite samples.
翻译:我们考虑的是从观测数据中学习一种适度深度的最佳定律树(即以二树为形式的个性化治疗分配政策)的问题,这个问题出现在许多具有社会重要性的领域,如公共卫生和个人医学,通过被动收集数据而不是随机试验,根据部署中收集的数据寻求可解释和数据驱动的干预措施;我们提出一种方法,利用混合喷雾优化技术来学习最佳定律树(即以二树为形式的个性化治疗分配政策);我们表明,在温和条件下,我们的方法并不精确,因为历史数据样本的数量往往是无限的,因此它与最佳的全副性治疗分配政策交汇而成。这使我们有别于关于这个主题的现有文献,即要求数据随机化或对树木强加严格的假设。我们根据对合成数据和真实数据进行的广泛计算实验,证明我们的无症状保证可转化为显著的外延缩性性业绩改进,即使在有限的样品中也是如此。