Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embeddings, (b) agrees with the unbiased definition of the word according to the dictionary, and (c) remains orthogonal to the vector space spanned by any biased basis vectors in the pre-trained word embedding space. Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings, while preserving useful semantics.
翻译:在大型社团上受过训练的字嵌入式表明,对不公平的性别、种族、宗教和族裔偏见的高度定义了不公平的性别、种族、宗教和族裔偏见。相比之下,人文字典用简洁、客观和不带偏见的方式描述字词的含义。我们建议一种方法,用字典来贬低预先训练的字嵌入式的词嵌入式的偏向性,而不需要获得原始训练资源或对所用字嵌入算法的任何知识。与以前的工作不同,我们提议的方法并不要求以字典形式预先界定偏见的类型,并学习从字典定义中自动地用不带偏见的字嵌入式词必须满足的限制因素。具体地说,我们学习了一种编码,以产生一个不偏向的输入词嵌入式的输入词,这样就:(a) 保留预先训练字嵌入式词的语义的语义,(b) 同意根据字典对字典对字典的字典的不偏向性定义,以及(c) 继续以任何有偏见的矢入式的矢入空间的矢入式,并学习必须自动从字典嵌入该词嵌入该词定义的制约。具体,我们学习一个编码,在标准的输入标准的输入式数据库中,同时精确地显示一个输入式的输入式的输入错误式的输入式的字典,同时在标准式的字典,在精确地显示。