As large language models (LLMs) are integrated into sociotechnical systems, it is crucial to examine the privacy biases they exhibit. We define privacy bias as the appropriateness value of information flows in responses from LLMs. A deviation between privacy biases and expected values, referred to as privacy bias delta, may indicate privacy violations. As an auditing metric, privacy bias can help (a) model trainers evaluate the ethical and societal impact of LLMs, (b) service providers select context-appropriate LLMs, and (c) policymakers assess the appropriateness of privacy biases in deployed LLMs. We formulate and answer a novel research question: how can we reliably examine privacy biases in LLMs and the factors that influence them? We present a novel approach for assessing privacy biases using a contextual integrity-based methodology to evaluate the responses from various LLMs. Our approach accounts for the sensitivity of responses across prompt variations, which hinders the evaluation of privacy biases. Finally, we investigate how privacy biases are affected by model capacities and optimizations.
翻译:随着大型语言模型(LLMs)被整合到社会技术系统中,审视它们所展现的隐私偏见变得至关重要。我们将隐私偏见定义为LLMs响应中信息流的适当性价值。隐私偏见与期望值之间的偏差,称为隐私偏见增量,可能表明存在隐私侵犯。作为一种审计度量,隐私偏见有助于:(a)模型训练者评估LLMs的伦理和社会影响;(b)服务提供商选择适合情境的LLMs;(c)政策制定者评估已部署LLMs中隐私偏见的适当性。我们提出并回答了一个新颖的研究问题:如何可靠地审视LLMs中的隐私偏见及其影响因素?我们提出了一种基于情境完整性方法论的新颖方法,用于评估各种LLMs的响应,从而评估隐私偏见。我们的方法考虑了响应在不同提示变化下的敏感性,这种敏感性阻碍了对隐私偏见的评估。最后,我们研究了模型能力和优化如何影响隐私偏见。