Because of the increasing use of data-centric systems and algorithms in machine learning, the topic of fairness is receiving a lot of attention in the academic and broader literature. This paper introduces Dbias (https://pypi.org/project/Dbias/), an open-source Python package for ensuring fairness in news articles. Dbias can take any text to determine if it is biased. Then, it detects biased words in the text, masks them, and suggests a set of sentences with new words that are bias-free or at least less biased. We conduct extensive experiments to assess the performance of Dbias. To see how well our approach works, we compare it to the existing fairness models. We also test the individual components of Dbias to see how effective they are. The experimental results show that Dbias outperforms all the baselines in terms of accuracy and fairness. We make this package (Dbias) as publicly available for the developers and practitioners to mitigate biases in textual data (such as news articles), as well as to encourage extension of this work.
翻译:由于在机器学习中越来越多地使用以数据为中心的系统和算法,公平问题在学术和更广泛的文献中正受到大量关注。本文介绍Dbias (https://pypi.org/project/Dbias/Dbias/),这是一个公开源代码的Python软件包,以确保新闻文章的公平性。Dbias可以使用任何文本来确定它是否带有偏向性。然后,它发现文本中带有偏见的词句,遮盖它们,并提出一套带有无偏见或至少不那么偏颇的新词的句子。我们进行了广泛的实验,以评估Dbias的性能。为了了解我们的方法如何运作,我们将它与现有的公平模式进行比较。我们还测试Dbias的各个组成部分,看它们是否有效。实验结果表明,Dbias在准确性和公平性方面超越了所有基线。我们向开发者和从业人员公开提供这一软件包(Dbias),以减轻文字数据(例如新闻文章)中的偏差,并鼓励扩大这项工作。