We study Individual Fairness (IF) for Bayesian neural networks (BNNs). Specifically, we consider the $\epsilon$-$\delta$-individual fairness notion, which requires that, for any pair of input points that are $\epsilon$-similar according to a given similarity metrics, the output of the BNN is within a given tolerance $\delta>0.$ We leverage bounds on statistical sampling over the input space and the relationship between adversarial robustness and individual fairness to derive a framework for the systematic estimation of $\epsilon$-$\delta$-IF, designing Fair-FGSM and Fair-PGD as global,fairness-aware extensions to gradient-based attacks for BNNs. We empirically study IF of a variety of approximately inferred BNNs with different architectures on fairness benchmarks, and compare against deterministic models learnt using frequentist techniques. Interestingly, we find that BNNs trained by means of approximate Bayesian inference consistently tend to be markedly more individually fair than their deterministic counterparts.
翻译:本文研究贝叶斯神经网络(BNN)中的个体公平性(IF)。具体来说,我们考虑$\epsilon$-$\delta$个体公平性概念,该概念要求对于任何一对根据给定相似性指标$\epsilon$-相似的输入点,BNN的输出在给定的容差$\delta>0$内。我们利用输入空间上的统计采样限制以及对抗鲁棒性与个体公平性之间的关系,提出了一个框架来系统地估计$\epsilon$-$\delta$-IF,设计了Fair-FGSM和Fair-PGD作为BNN的全局,公平感知扰动攻击的扩展。我们在公平基准测试中对不同架构的近似推断BNN的IF进行了经验研究,并与使用经验技术学习的确定性模型进行了比较。有趣的是,我们发现通过近似贝叶斯推断训练的BNN往往比其确定性对应物更具有明显的个体公平性。