迭代式多语言谱属性擦除 (Iterative Multilingual Spectral Attribute Erasure)

Multilingual representations embed words with similar meanings to share a common semantic space across languages, creating opportunities to transfer debiasing effects between languages. However, existing methods for debiasing are unable to exploit this opportunity because they operate on individual languages. We present Iterative Multilingual Spectral Attribute Erasure (IMSAE), which identifies and mitigates joint bias subspaces across multiple languages through iterative SVD-based truncation. Evaluating IMSAE across eight languages and five demographic dimensions, we demonstrate its effectiveness in both standard and zero-shot settings, where target language data is unavailable, but linguistically similar languages can be used for debiasing. Our comprehensive experiments across diverse language models (BERT, LLaMA, Mistral) show that IMSAE outperforms traditional monolingual and cross-lingual approaches while maintaining model utility.

翻译：多语言表示将语义相近的词汇嵌入到跨语言共享的语义空间中，这为在语言间传递去偏效果创造了条件。然而，现有去偏方法因仅针对单一语言操作而无法利用这一优势。本文提出迭代式多语言谱属性擦除方法，该方法通过基于奇异值分解的迭代截断技术，识别并消除跨多语言的联合偏置子空间。通过在八种语言和五个人口统计维度上的评估，我们验证了该方法在标准场景与零样本场景下的有效性——在零样本场景中，虽然目标语言数据不可用，但可利用语言相似的语言进行去偏。我们在多种语言模型上的综合实验表明，该方法在保持模型实用性的同时，显著优于传统的单语言及跨语言去偏方法。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【AAAI2023】MHCCL:多变量时间序列的掩蔽层次聚类对比学习

专知会员服务

17+阅读 · 2022年12月9日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日