There is increasing interest in allocating treatments based on observed individual data: examples include heterogeneous pricing, individualized credit offers, and targeted social programs. Personalized policy introduces incentives for individuals to modify their behavior to obtain a better treatment. We show standard risk minimization-based estimators are sub-optimal when observed covariates are endogenous to the treatment allocation rule. We propose a dynamic experiment that converges to the optimal treatment allocation function without parametric assumptions on individual strategic behavior, and prove that it has regret that decays at a linear rate. We validate the method in simulations and in a small MTurk experiment.
翻译:人们越来越有兴趣根据观察到的个人数据来分配治疗:例子包括不同定价、个性化信贷提供和有针对性的社会方案。个性化政策为个人提供了改变其行为以获得更好的治疗的奖励。当观察到的共变法与治疗分配规则是内在的时,我们展示了标准的风险最小化估计值是次优的。我们提议进行动态实验,在不对个人战略行为进行参数假设的情况下,与最佳治疗分配功能趋同,并证明它对以线性速率衰减感到遗憾。我们在模拟和小型MTurk实验中验证了这种方法。