Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.
翻译:黑盒风险评分模型贯穿我们的生活,但通常都是专有的或不透明的。 我们提出“ 蒸馏和比较”模型, 一种示范蒸馏和比较方法来审计这些模型。 为了深入了解黑盒模型, 我们把它们作为教师对待, 训练透明的学生模型来模仿黑盒模型分配的风险分数。 我们比较了经过蒸馏训练的学生模型, 把它与经过地面真实性训练的第二个未经蒸馏的透明模型相比较, 并使用两种模型之间的差异来了解黑盒模型。 我们的方法可以在现实的环境中应用, 而不研究黑盒模型 API 。 我们展示了四种公共数据集: COMPAS、 停止和风险、 芝加哥警察局 和 Lending Club 的处理方法。 我们还提出一个统计测试, 以确定数据集是否缺少用于培训黑盒模型的关键特征。 我们的测试发现, Propica 数据可能丢失了在 COMPAS 中使用的关键特征 。