The increasing availability of individual-level data has led to numerous applications of individualized (or personalized) treatment rules (ITRs). Policy makers often wish to empirically evaluate ITRs and compare their relative performance before implementing them in a target population. We propose a new evaluation metric, the population average prescriptive effect (PAPE). The PAPE compares the performance of ITR with that of non-individualized treatment rule, which randomly treats the same proportion of units. Averaging the PAPE over a range of budget constraints yields our second evaluation metric, the area under the prescriptive effect curve (AUPEC). The AUPEC represents an overall performance measure for evaluation, like the area under the receiver and operating characteristic curve (AUROC) does for classification, and is a generalization of the QINI coefficient utilized in uplift modeling. We use Neyman's repeated sampling framework to estimate the PAPE and AUPEC and derive their exact finite-sample variances based on random sampling of units and random assignment of treatment. We extend our methodology to a common setting, in which the same experimental data is used to both estimate and evaluate ITRs. In this case, our variance calculation incorporates the additional uncertainty due to random splits of data used for cross-validation. The proposed evaluation metrics can be estimated without requiring modeling assumptions, asymptotic approximation, or resampling methods. As a result, it is applicable to any ITR including those based on complex machine learning algorithms. The open-source software package is available for implementing the proposed methodology.
翻译:个人数据越来越容易获得,导致大量应用个人化(或个性化)处理规则(ITRs),决策者往往希望对ITRs进行实证性评估,比较其相对业绩,然后在目标人群中执行这些规则。我们提出了一个新的评价指标,即人口平均规定效果(PAPE)。PAP将I的性能与非个人化处理规则的性能进行比较,后者随机处理相同比例的单位。在一系列预算限制方面对PAPE进行核实,得出了我们的第二次公开评价标准,即规定效果曲线(AUPEC)之下的领域。AUPEC是评价的总体业绩计量,如接受者和业务特征曲线下的领域,用于分类,并比较其相对效绩。我们建议采用Neyman的反复抽样框架来估计IPE和AUPEC的性能,根据对单位的随机抽样抽样和随机分配的处理得出确切的定量差异。我们将我们的方法推广到一个共同的场合,在使用同样的实验数据来估计和评估可适用的ITRs(AURC)和操作特征曲线下的领域。在改进模型时,采用这种估算方法是用于计算差异的计算方法。我们的拟议计算方法。在计算方法中可以采用。在计算方法中采用任何差异性估算方法。在计算方法中采用。在计算方法。在计算方法中,用于计算。