It is often critical for prediction models to be robust to distributional shifts between training and testing data. From a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model which is robust to the feature shifts and therefore transfers to a new targeting distribution. Theoretically, we justify the robustness of the approach against feature shifts with the knowledge from causal learning. Empirically, we evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform, including a click prediction task and in the context of an end-to-end counterfactual optimization system. The proposed CTRF produces robust predictions and outperforms most baseline methods compared in the presence of feature shifts.
翻译:预测模型对于培训和测试数据之间的分布变化往往至关重要。从因果关系的角度来看,挑战在于将稳定的因果关系与不稳定的假相交错区分开来。我们描述了一种因果转让随机森林(CTRF),它将现有的培训数据与少量随机实验数据结合起来,以培训一种对特征变化具有活力的模型,从而向新的目标分布转移。理论上,我们用因果学习的知识来证明该方法对特征变化的稳健性。我们利用合成数据实验和Bing Ads平台上的现实世界实验,包括点击预测任务,并在终端到终端反事实优化系统的背景下,评估CTRF。拟议的CTR产生强有力的预测,并比特征变化中的大多数基线方法更完美。