Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose a transparent model distillation approach to audit such models. Model distillation was first introduced to transfer knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. To this we add a third criterion - transparency. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by the teacher. Moreover, we use side information in the form of the actual outcomes the teacher scoring model was intended to predict in the first place. By training a second transparent model on the outcomes, we can compare the two models to each other. When comparing models trained on risk scores to models trained on outcomes, we show that it is necessary to calibrate the risk-scoring model's predictions to remove distortion that may have been added to the black-box risk-scoring model during or after its training process. We also show how to compute confidence intervals for the particular class of transparent student models we use - tree-based additive models with pairwise interactions (GA2Ms) - to support comparison of the two transparent models. We demonstrate the methods on four public datasets: COMPAS, Lending Club, Stop-and-Frisk, and Chicago Police.
翻译:黑匣子风险评分模型贯穿我们的生活,但通常都是专有的或不透明。 我们提出透明的模型蒸馏方法来审计这些模型。 模型蒸馏首先将知识从大型、 复杂的教师模型转移到更快捷、更简单的学生模型, 在预测准确性方面没有重大损失。 我们为此添加了第三个标准 - 透明度。 为了深入了解黑盒模型, 我们把它们作为教师对待, 培训透明的学生模型来模仿教师分配的风险评分。 此外, 我们使用以实际结果为形式的侧面信息, 教师评分模型本来打算首先预测的。 通过培训第二个透明模型, 我们可以将两个模型相互比较。 在比较风险评分模型和结果模型时, 我们表明有必要校准风险计分模型的预测, 以消除黑盒风险计分模型在培训过程中或培训之后可能添加到的风险计分数模型中的扭曲。 我们还展示了如何为特定类别的透明学生模型 — 树基补分模型与配对式互动( GA2MS- Chica- ) 来比较两个透明的数据。