With the rise of AI, algorithms have become better at learning underlying patterns from the training data including ingrained social biases based on gender, race, etc. Deployment of such algorithms to domains such as hiring, healthcare, law enforcement, etc. has raised serious concerns about fairness, accountability, trust and interpretability in machine learning algorithms. To alleviate this problem, we propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases from tabular datasets. It uses a graphical causal model to represent causal relationships among different features in the dataset and as a medium to inject domain knowledge. A user can detect the presence of bias against a group, say females, or a subgroup, say black females, by identifying unfair causal relationships in the causal network and using an array of fairness metrics. Thereafter, the user can mitigate bias by acting on the unfair causal edges. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset based on the current causal model. Users can visually assess the impact of their interactions on different fairness metrics, utility metrics, data distortion, and the underlying data distribution. Once satisfied, they can download the debiased dataset and use it for any downstream application for fairer predictions. We evaluate D-BIAS by conducting experiments on 3 datasets and also a formal user study. We found that D-BIAS helps reduce bias significantly compared to the baseline debiasing approach across different fairness metrics while incurring little data distortion and a small loss in utility. Moreover, our human-in-the-loop based approach significantly outperforms an automated approach on trust, interpretability and accountability.
翻译:随着AI的上升,算法在从培训数据中学习基本模式方面变得更好,包括基于性别、种族等的根深蒂固的社会偏见。 将这种算法应用于雇用、保健、执法等领域,引起了人们对机器学习算法中的公平性、问责制、信任性和可解释性的严重关切。 为了缓解这一问题,我们提议D-BAIAS,这是一个视觉互动工具,它体现人工流动AI的审计和从表格数据集中减少社会偏见。它使用一个图形自动因果模型来代表数据集中不同特征之间的因果关系,并作为一种输入域域知识的媒介。 用户可以通过查明因果网络中的不公平因果关系、问责制、信任和对机器学习的诠释,从而发现对一个群体(如女性或分组)的偏见。 用户通过查明因果网络中的不公平因果关系,并使用一系列公平度指标来减轻偏差的因果关系。 对于每一项互动,系统使用一种新颖的方法来模拟基于当前因果模型的新的(偏差)数据设置的因果关系。 用户可以明显地评估其数据在标准数据下载过程中的可靠性,我们也可以评估其数据流数据流的稳定性, 。