Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy. This capability of word embeddings has been successfully exploited as a tool to quantify and study human biases. However, previous studies only consider a predefined set of biased concepts to attest (e.g., whether gender is more or less associated with particular jobs), or just discover biased words without helping to understand their meaning at the conceptual level. As such, these approaches can be either unable to find biased concepts that have not been defined in advance, or the biases they find are difficult to interpret and study. This could make existing approaches unsuitable to discover and interpret biases in online communities, as such communities may carry different biases than those in mainstream culture. This paper improves upon, extends, and evaluates our previous data-driven method to automatically discover and help interpret biased concepts encoded in word embeddings. We apply this approach to study the biased concepts present in the language used in online communities and experimentally show the validity and stability of our method
翻译:语言带有隐含的人类偏见,既作为人们随身携带的陈规定型观念的反射和永久化,也具有隐含的人类偏见。最近,基于语言嵌入等以ML为主的NLP方法被证明以惊人的准确性学习这种语言偏见。这种嵌入字的能力被成功地用作量化和研究人类偏见的工具。然而,以前的研究只考虑预先界定的一套偏见概念来证明(例如性别是否与特定工作多少相关联),或只是发现带有偏见的词,而没有帮助理解其概念层面的含义。因此,这些方法要么无法找到未事先界定的有偏见的概念,要么发现难以解释和研究的偏见。这可能会使现有的方法不适合发现和解释在线社区中的偏见,因为这类社区可能带有不同于主流文化中的偏见。本文改进、扩展和评价了我们先前的数据驱动方法,以便自动发现和帮助解释以文字嵌入为编码的有偏见的概念。我们采用这一方法是为了研究在线社区使用的语言中存在的有偏见的概念,并实验性地显示我们的方法的有效性和稳定性。