We introduce bipol, a new metric with explainability, for estimating social bias in text data. Harmful bias is prevalent in many online sources of data that are used for training machine learning (ML) models. In a step to address this challenge we create a novel metric that involves a two-step process: corpus-level evaluation based on model classification and sentence-level evaluation based on (sensitive) term frequency (TF). After creating new models to detect bias along multiple axes using SotA architectures, we evaluate two popular NLP datasets (COPA and SQUAD). As additional contribution, we created a large dataset (with almost 2 million labelled samples) for training models in bias detection and make it publicly available. We also make public our codes.
翻译:我们介绍了Bipol,这是一种具有解释性的新指标,用于评估文本数据中的社会偏见。有害的偏见在许多在线数据来源中普遍存在,这些数据用于训练机器学习(ML)模型。为了解决这一挑战,我们创建了一种新的指标,它涉及两个步骤:基于模型分类的语料库级别评估和基于(敏感)词频(TF)的句子级别评估。创建了使用SotA架构检测多个轴上的偏差的新模型后,我们评估了两个流行的NLP数据集(COPA和SQUAD)。作为额外的贡献,我们创建了一个大型数据集(几乎有200万个标记样本)来训练偏见检测模型,并将其公开。我们还公开了我们的代码。