We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy -- including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
翻译:我们考虑了使用过去观测数据评估决策政策绩效的问题。一项政策的结果是按损失衡量的(如损益或负奖赏),主要问题是,在根据不同和可能未知的政策观察过去的数据时,对它的损失作出有效的推断。我们采用抽样分解方法表明,可以对整个损失分布进行有限抽样保险担保,而不仅仅是其平均值。 重要的是,该方法考虑到了过去政策的模型特征 -- -- 包括未计量的混杂。评估方法可以用来根据一系列特定可信的模型假设,用观测数据证明政策的执行情况。