We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.
翻译:我们研究线外背景强盗的政策评估,研究对象不为人知,敏感度分析方法通常用于在最坏情况下对政策价值进行估测,对某一不确定因素组进行最坏情况纠结;然而,现有工作往往采用某些粗略放松为可移动性而设定的不确定性,从而导致对政策价值的过分保守估计;在本文中,我们提议了一个总估计器,提供政策价值的下限;可以证明我们的估计器中含有Dorn和Guo(2022年)最近提议的尖锐估计器,作为一个特例,我们的方法使得传统的边际敏感模式能够利用f-diverence进行新的扩展;为了构建我们的估计器,我们利用内核法获得一个可拉近到有条件时刻的临界,而传统的非正位估计器没有考虑到这些条件;在理论分析中,我们为选择保证不出现偏离较低约束估计的规格错误提供了条件;此外,我们提供了政策评估和学习的一致性保证。在进行综合和实际数据方法试验时,我们展示了拟议的有效性。</s>