As Natural Language Processing (NLP) technology rapidly develops and spreads into daily life, it becomes crucial to anticipate how its use could harm people. However, our ways of assessing the biases of NLP models have not kept up. While especially the detection of English gender bias in such models has enjoyed increasing research attention, many of the measures face serious problems, as it is often unclear what they actually measure and how much they are subject to measurement error. In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics -- a field specialized in the measurement of concepts like bias that are not directly observable. We pair an introduction of relevant psychometric concepts with a discussion of how they could be used to evaluate and improve bias measures. We also argue that adopting psychometric vocabulary and methodology can make NLP bias research more efficient and transparent.
翻译:随着自然语言处理技术迅速发展和扩散到日常生活中,预测其使用会如何伤害人就变得至关重要了。然而,我们评估自然语言处理模型偏向的方法没有跟上来。虽然在这类模型中发现英语性别偏见的问题受到越来越多的研究关注,但许多措施都面临严重问题,因为它们往往不清楚它们实际衡量的是什么,以及它们在多大程度上会受到测量错误的影响。在本文件中,我们提供了一个多学科方法,通过采用心理计量学的透镜来讨论自然语言处理模型偏向问题,这个领域专门用来衡量诸如无法直接观察的偏见等概念。我们把引入相关的心理计量概念与讨论如何利用这些概念来评价和改进偏向措施结合起来。我们还认为,采用精神计量词汇和方法可以提高国家语言处理模型偏向研究的效率和透明度。