Research has shown that while large language models (LLMs) can generate their responses based on cultural context, they are not perfect and tend to generalize across cultures. However, when evaluating the cultural bias of a language technology on any dataset, researchers may choose not to engage with stakeholders actually using that technology in real life, which evades the very fundamental problem they set out to address. Inspired by the work done by arXiv:2005.14050v2, I set out to analyse recent literature about identifying and evaluating cultural bias in Natural Language Processing (NLP). I picked out 20 papers published in 2025 about cultural bias and came up with a set of observations to allow NLP researchers in the future to conceptualize bias concretely and evaluate its harms effectively. My aim is to advocate for a robust assessment of the societal impact of language technologies exhibiting cross-cultural bias.
翻译:研究表明,尽管大型语言模型(LLMs)能够基于文化语境生成回应,但其并非完美,且倾向于跨文化泛化。然而,在评估语言技术于任何数据集上的文化偏见时,研究者可能选择不与实际使用该技术的现实利益相关方进行互动,这恰恰回避了他们试图解决的根本问题。受arXiv:2005.14050v2工作的启发,本文系统分析了近期关于自然语言处理(NLP)中文化偏见识别与评估的文献。通过筛选2025年发表的20篇相关论文,总结出一系列观察结果,以帮助未来NLP研究者具体化偏见的概念并有效评估其危害。本文旨在倡导对呈现跨文化偏见的语言技术进行严格的社会影响评估。