Text representation models are prone to exhibit a range of societal biases, reflecting the non-controlled and biased nature of the underlying pretraining data, which consequently leads to severe ethical issues and even bias amplification. Recent work has predominantly focused on measuring and mitigating bias in pretrained language models. Surprisingly, the landscape of bias measurements and mitigation resources and methods for conversational language models is still very scarce: it is limited to only a few types of bias, artificially constructed resources, and completely ignores the impact that debiasing methods may have on the final performance in dialog tasks, e.g., conversational response generation. In this work, we present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender, race, religion, and queerness. Further, we develop an evaluation framework which simultaneously 1) measures bias on the developed RedditBias resource, and 2) evaluates model capability in dialog tasks after model debiasing. We use the evaluation framework to benchmark the widely used conversational DialoGPT model along with the adaptations of four debiasing methods. Our results indicate that DialoGPT is biased with respect to religious groups and that some debiasing techniques can remove this bias while preserving downstream task performance.
翻译:文本代表模式容易表现出一系列社会偏见,反映了基础培训前数据的非控制和偏见性质,从而导致严重的道德问题,甚至扩大偏见。最近的工作主要侧重于衡量和减轻预先培训的语言模式中的偏见。令人惊讶的是,偏见衡量和缓解资源以及对话语言模式方法的格局仍然非常稀少:它仅限于少数几种类型的偏见,人为构建的资源,完全忽视贬低方法可能对对话任务的最后表现(例如对话响应生成)产生的影响。在这项工作中,我们介绍Reddit Bias,这是基于来自Redddit的实际人类对话的第一个对话数据集,允许在性别、种族、宗教和同性恋四个重要的偏见方面衡量和缓解偏见。此外,我们开发了一个评价框架,它同时1)衡量对开发的RedditBias资源的偏差,2)评价模式在模式脱偏见后的对话任务中的示范能力。我们使用评价框架,将广泛使用的DialoGPT模式作为基准,同时将Redddit公司实际对话中的人类对话模式作为衡量基准,同时,允许从性别、宗教、宗教、信仰、信仰、信仰、信仰、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举、选举等等等等等等等等等任务的调整工作的成果。