Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world interactions often contain implicit biases, inferred subtly through names, cultural cues, or traits. This critical oversight creates a significant blind spot in fairness evaluation. We introduce ImplicitBBQ, a benchmark extending the Bias Benchmark for QA (BBQ) with implicitly cued protected attributes across 6 categories. Our evaluation of GPT-4o on ImplicitBBQ illustrates troubling performance disparity from explicit BBQ prompts, with accuracy declining up to 7% in the "sexual orientation" subcategory and consistent decline located across most other categories. This indicates that current LLMs contain implicit biases undetected by explicit benchmarks. ImplicitBBQ offers a crucial tool for nuanced fairness evaluation in NLP.
翻译:现有评估大型语言模型(LLM)偏见的基准主要依赖显性线索,即直接声明受保护属性(如宗教、种族、性别)的名称。然而,现实世界中的互动常包含隐性偏见,这些偏见通过姓名、文化线索或特征被微妙地推断出来。这一关键疏漏在公平性评估中造成了显著的盲区。我们提出了ImplicitBBQ,这是一个扩展自问答偏见基准(BBQ)的基准,在6个类别中引入了基于隐性线索的受保护属性。我们在ImplicitBBQ上对GPT-4o的评估显示,与显性BBQ提示相比,其表现存在令人担忧的差异:在“性取向”子类别中准确率下降高达7%,且在大多数其他类别中也观察到一致的下降。这表明当前LLM中存在未被显性基准检测到的隐性偏见。ImplicitBBQ为自然语言处理领域的细致公平性评估提供了一个关键工具。