This study presents a systematic comparison of methods for individual treatment assignment, a general problem that arises in many applications and has received significant attention from economists, computer scientists, and social scientists. We group the various methods proposed in the literature into three general classes of algorithms (or metalearners): learning models to predict outcomes (the O-learner), learning models to predict causal effects (the E-learner), and learning models to predict optimal treatment assignments (the A-learner). We compare the metalearners in terms of (1) their level of generality and (2) the objective function they use to learn models from data; we then discuss the implications that these characteristics have for modeling and decision making. Notably, we demonstrate analytically and empirically that optimizing for the prediction of outcomes or causal effects is not the same as optimizing for treatment assignments, suggesting that in general the A-learner should lead to better treatment assignments than the other metalearners. We demonstrate the practical implications of our findings in the context of choosing, for each user, the best algorithm for playlist generation in order to optimize engagement. This is the first comparison of the three different metalearners on a real-world application at scale (based on more than half a billion individual treatment assignments). In addition to supporting our analytical findings, the results show how large A/B tests can provide substantial value for learning treatment assignment policies, rather than simply choosing the variant that performs best on average.
翻译:这项研究系统地比较了个人治疗任务分配的方法,这是许多应用中出现的一个普遍问题,并得到了经济学家、计算机科学家和社会科学家的高度重视。我们将这些文献中提议的各种方法分为三大类算法(或金属制造者):预测结果的学习模型(O-Learner),预测因果关系的学习模型(E-learner),预测最佳治疗任务分配的学习模型(A-learner),以及预测最佳治疗任务分配的学习模型(A-learner)。我们从(1)其一般程度和(2)他们用来从数据中学习模型的客观功能的角度比较金属采集者;我们然后讨论这些特征对建模和决策的影响。值得注意的是,我们从分析和实验上表明,优化预测结果或因果关系的最佳方法与优化治疗任务(O-learner)的学习模型(O-learner)不同,总体而言,A-learner应导致更好的治疗任务分配(E-learner),以及预测最佳治疗任务(A-learner)的学习任务,我们从简单选择每个用户最佳选择游戏名单生成的算法,以优化参与。这是对三种不同金属制造标准进行半级分析结果的第一次的比较的比较,在实际学习结果上,可以进行实质性的进度上进行更大规模的评估。