Multilingual representations embed words with similar meanings to share a common semantic space across languages, creating opportunities to transfer debiasing effects between languages. However, existing methods for debiasing are unable to exploit this opportunity because they operate on individual languages. We present Iterative Multilingual Spectral Attribute Erasure (IMSAE), which identifies and mitigates joint bias subspaces across multiple languages through iterative SVD-based truncation. Evaluating IMSAE across eight languages and five demographic dimensions, we demonstrate its effectiveness in both standard and zero-shot settings, where target language data is unavailable, but linguistically similar languages can be used for debiasing. Our comprehensive experiments across diverse language models (BERT, LLaMA, Mistral) show that IMSAE outperforms traditional monolingual and cross-lingual approaches while maintaining model utility.
翻译:多语言表示将语义相近的词汇嵌入到跨语言共享的语义空间中,这为在语言间传递去偏效果创造了条件。然而,现有去偏方法因仅针对单一语言操作而无法利用这一优势。本文提出迭代式多语言谱属性擦除方法,该方法通过基于奇异值分解的迭代截断技术,识别并消除跨多语言的联合偏置子空间。通过在八种语言和五个人口统计维度上的评估,我们验证了该方法在标准场景与零样本场景下的有效性——在零样本场景中,虽然目标语言数据不可用,但可利用语言相似的语言进行去偏。我们在多种语言模型上的综合实验表明,该方法在保持模型实用性的同时,显著优于传统的单语言及跨语言去偏方法。