Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some individuals but not others. This has motivated a push towards personalization and accurate per-state estimates of heterogeneous treatment effects (HTEs). Given the limited data present in many important applications, individual predictions can come at a cost to accuracy and confidence in such predictions. We develop a method to balance the need for personalization with confident predictions by identifying subgroups where it is possible to confidently estimate the expected difference in a new decision policy relative to a baseline. We propose a novel loss function that accounts for uncertainty during the subgroup partitioning phase. In experiments, we show that our method can be used to form accurate predictions of HTEs where other methods struggle.
翻译:可用于连续决策的非政策性政策评价方法可以用来帮助确定拟议的决策政策是否优于现行基线政策。然而,新的决策政策可能优于某些个人的基线政策,而不是另一些人的基线政策。这推动了对各种治疗效果(HTEs)的个性化和准确的人均估计。鉴于许多重要应用中存在的有限数据,个别预测可能以准确性和信任性为代价。我们制定了一种方法来平衡个性化需要与自信预测之间的平衡,方法是在有可能有信心地估计新决策政策相对于基线的预期差异的分组中确定一个分组。我们提出了一个新的损失函数,用于说明分组分治阶段的不确定性。在实验中,我们证明我们的方法可以用来在其他方法挣扎的情况下对HTEs作出准确的预测。