High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in different paradigms of machine learning and data mining. These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping. This implies that there is obfuscation in the way concepts are captured in these embedding spaces. Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces. As a result, invalid associations (such as different races and their association with a polar notion of good versus bad) are made and propagated by the representations, leading to unfair outcomes in different tasks where they are used. This work addresses some of these problems pertaining to the transparency and interpretability of such representations. A primary focus is the detection, quantification, and mitigation of socially biased associations in language representation.
翻译:对文字、文字、图像、知识图表和其他结构化数据的高度表述通常用于机器学习和数据挖掘的不同范式中,这些表述具有不同的可解释性,有效分布的表述以尺寸绘图丧失特征为代价,这意味着在这些嵌入空间中概念的捕捉方式模糊不清,在许多表述和任务中都可以看到其影响,其中一种影响特别大,在语言表述中特别大,从基本数据中汲取的社会偏见被捕捉并隐蔽在未知的维度和亚空间中,因此,各种表述(例如不同种族及其与极地的好坏概念的联系)产生并传播无效的关联,导致不同任务使用时产生不公平的结果,这项工作解决了与这种表述的透明度和可解释性有关的一些问题,一个主要重点是发现、量化和减少语言代表中的有社会偏见的关联。