Group fairness metrics can detect when a deep learning model behaves differently for advantaged and disadvantaged groups, but even models that score well on these metrics can make blatantly unfair predictions. We present smooth prediction sensitivity, an efficiently computed measure of individual fairness for deep learning models that is inspired by ideas from interpretability in deep learning. smooth prediction sensitivity allows individual predictions to be audited for fairness. We present preliminary experimental results suggesting that smooth prediction sensitivity can help distinguish between fair and unfair predictions, and that it may be helpful in detecting blatantly unfair predictions from "group-fair" models.
翻译:当深层次的学习模式对有利和处境不利的群体表现不同时,群体公平度指标可以检测到,但即使是在这些衡量标准上得分好的模型也可以作出明显不公平的预测。 我们提出平滑的预测敏感性,这是由深层次的可解释性理念所启发的深层次学习模式对个人公平性的有效计算尺度。 顺利的预测敏感性允许对个体预测进行公正审计。 我们提出初步实验结果,表明顺利的预测敏感性可以帮助区分公平和不公平的预测,并且可能有助于发现“群体公平”模式中明显不公平的预测。