在抽样选择中普及非政策学习 (Generalizing Off-Policy Learning under Sample Selection Bias)

Learning personalized decision policies that generalize to the target population is of great relevance. Since training data is often not representative of the target population, standard policy learning methods may yield policies that do not generalize target population. To address this challenge, we propose a novel framework for learning policies that generalize to the target population. For this, we characterize the difference between the training data and the target population as a sample selection bias using a selection variable. Over an uncertainty set around this selection variable, we optimize the minimax value of a policy to achieve the best worst-case policy value on the target population. In order to solve the minimax problem, we derive an efficient algorithm based on a convex-concave procedure and prove convergence for parametrized spaces of policies such as logistic policies. We prove that, if the uncertainty set is well-specified, our policies generalize to the target population as they can not do worse than on the training data. Using simulated data and a clinical trial, we demonstrate that, compared to standard policy learning methods, our framework improves the generalizability of policies substantially.

翻译：由于培训数据往往不代表目标人口,标准政策学习方法可能会产生不普遍目标人口的政策。为了应对这一挑战,我们提出了一个新的学习政策框架,以普及目标人口。为此,我们用选择变量将培训数据和目标人口之间的差异定性为抽样选择偏差。在围绕这一选择变量设定的不确定因素之外,我们优化了政策最小值,以实现目标人口的最佳最坏情况政策价值。为了解决这个小问题,我们根据一个 convex-concave程序取得了高效的算法,并证明在物流政策等政策准称空间方面实现了趋同。我们证明,如果精确地确定不确定性,我们的政策就会将目标人口作为抽样选择偏差,因为它们不能比培训数据差。我们利用模拟数据和临床试验,证明与标准的政策学习方法相比,我们的框架大大改善了政策的一般性。