Methods for learning optimal policies use causal machine learning models to create human-interpretable rules for making choices around the allocation of different policy interventions. However, in realistic policy-making contexts, decision-makers often care about trade-offs between outcomes, not just singlemindedly maximising utility for one outcome. This paper proposes an approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach to explore the trade-off between multiple outcomes. It does this by building a Pareto frontier of non-dominated models for different hyperparameter settings. The key here is that a low-cost surrogate function can be an accurate proxy for the very computationally costly optimal tree in terms of expected regret. This surrogate can be fit many times with different hyperparameter values to proxy the performance of the optimal model. The method is applied to a real-world case-study of conditional cash transfers in Morocco where hybrid (partially optimal, partially greedy) policy trees provide good performance as a surrogate for optimal trees while being computationally cheap enough to feasibly fit a Pareto frontier.
翻译:学习最佳政策的方法使用因果机学习模式,为围绕不同政策干预的分配作出选择创造人与人之间的解释规则。然而,在现实的决策环境中,决策者往往关心结果之间的权衡,而不仅仅是单心最大化对一个结果的效用。本文提出一种称为多目的政策学习的方法,将政策学习的最佳决策树与多目的巴伊西亚优化方法结合起来,以探索多种结果之间的权衡。它这样做的办法是为不同的超参数设置建立一个非主导模型的Pareto前沿。这里的关键是,低成本的代孕功能可以精确地替代成本极高的最佳树,按预期的遗憾计算,这种代孕功能在很多时候都适合不同的超参数值,以替代最佳模型的性能。这个方法用于在摩洛哥对有条件的现金转移进行真实世界案例研究,那里的混合(部分最佳的、部分贪婪的)政策树提供了良好的表现,作为最佳树木的替代,同时计算得足够廉价,足以适应Pareto边界。