因果策略森林：端到端策略学习算法 (Causal-Policy Forest for End-to-End Policy Learning)

This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome corresponding to the assigned treatment is observed. The goal of policy learning is to train a policy from the observed data, where a policy is a function that recommends an optimal treatment for each individual, to maximize the policy value. In this study, we first show that maximizing the policy value is equivalent to minimizing the mean squared error for the conditional average treatment effect (CATE) under $\{-1, 1\}$ restricted regression models. Based on this finding, we modify the causal forest, an end-to-end CATE estimation algorithm, for policy learning. We refer to our algorithm as the causal-policy forest. Our algorithm has three advantages. First, it is a simple modification of an existing, widely used CATE estimation method, therefore, it helps bridge the gap between policy learning and CATE estimation in practice. Second, while existing studies typically estimate nuisance parameters for policy learning as a separate task, our algorithm trains the policy in a more end-to-end manner. Third, as in standard decision trees and random forests, we train the models efficiently, avoiding computational intractability.

翻译：本研究提出了一种用于因果推断中策略学习的端到端算法。我们观测到的数据包含协变量、处理分配和结果变量，其中仅能观测到对应分配处理的结果。策略学习的目标是从观测数据中训练一个策略函数，该函数为每个个体推荐最优处理方案，以最大化策略价值。本研究首先证明，在$\{-1, 1\}$约束回归模型下，最大化策略价值等价于最小化条件平均处理效应（CATE）的均方误差。基于这一发现，我们对现有端到端CATE估计算法——因果森林进行改进，使其适用于策略学习任务。我们将改进后的算法称为因果策略森林。该算法具有三大优势：首先，它是对现有广泛使用的CATE估计方法的简单改进，有助于在实践中弥合策略学习与CATE估计之间的鸿沟；其次，现有研究通常将策略学习中的干扰参数估计作为独立任务，而本算法以更彻底的端到端方式训练策略函数；最后，与标准决策树和随机森林类似，本算法通过高效训练模型避免了计算不可行性问题。