Finding an effective medical treatment often requires a search by trial and error. Making this search more efficient by minimizing the number of unnecessary trials could lower both costs and patient suffering. We formalize this problem as learning a policy for finding a near-optimal treatment in a minimum number of trials using a causal inference framework. We give a model-based dynamic programming algorithm which learns from observational data while being robust to unmeasured confounding. To reduce time complexity, we suggest a greedy algorithm which bounds the near-optimality constraint. The methods are evaluated on synthetic and real-world healthcare data and compared to model-free reinforcement learning. We find that our methods compare favorably to the model-free baseline while offering a more transparent trade-off between search time and treatment efficacy.
翻译:寻找有效的医疗往往需要通过试验和错误进行搜索。 通过尽量减少不必要的试验次数来提高这种搜索的效率,可以降低费用和病人的痛苦。 我们将此问题正式化为学习一项政策,利用因果推断框架在最低数量的试验中找到近乎最佳的治疗。 我们给出基于模型的动态编程算法,从观察数据中学习,同时又强健和不测的混杂。 为了降低时间复杂性,我们建议采用贪婪的算法,将近乎最佳的限制因素捆绑在一起。 这种方法根据合成和现实世界的保健数据进行评估,并与无模型的强化学习进行比较。 我们发现,我们的方法与无模型基线比较,同时在搜索时间和治疗效果之间提供更加透明的交换。