训练前环境化嵌入式 (Debiasing Pre-trained Contextualised Embeddings)

In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings. Our proposed method can be applied to any pre-trained contextualised embedding model, without requiring to retrain those models. Using gender bias as an illustrative example, we then conduct a systematic study using several state-of-the-art (SoTA) contextualised representations on multiple benchmark datasets to evaluate the level of biases encoded in different contextualised embeddings before and after debiasing using the proposed method. We find that applying token-level debiasing for all tokens and across all layers of a contextualised embedding model produces the best performance. Interestingly, we observe that there is a trade-off between creating an accurate vs. unbiased contextualised embedding model, and different contextualised embedding models respond differently to this trade-off.

翻译：与为静态、非翻版的嵌入字形提议的许多贬低方法相比,背景嵌入中的歧视偏见相对较少受到重视。我们建议了一种微调方法,可用于在象征性或句级上降低刻入字形,用于预先培训的背景嵌入。我们提议的方法可以适用于任何预先培训的背景嵌入模型,而无需对这些模型进行再培训。我们以性别偏见作为示例,然后利用多个基准数据集的一些最先进的背景化模型进行系统研究,以评价在使用拟议方法的不同背景化嵌入前后所编码的偏见程度。我们发现,对背景化嵌入模型的所有符号和所有层面都采用象征性的贬低偏差产生了最佳业绩。有趣的是,我们发现在创建准确的和不偏切实际的嵌入模型与不同的背景化嵌入模型对这项交易作出不同反应之间存在着权衡。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日