基于人工智能的对话代理中安全与隐私风险诱导性使用行为的普遍性研究 (Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents)

Recent improvement gains in large language models (LLMs) have lead to everyday usage of AI-based Conversational Agents (CAs). At the same time, LLMs are vulnerable to an array of threats, including jailbreaks and, for example, causing remote code execution when fed specific inputs. As a result, users may unintentionally introduce risks, for example, by uploading malicious files or disclosing sensitive information. However, the extent to which such user behaviors occur and thus potentially facilitate exploits remains largely unclear. To shed light on this issue, we surveyed a representative sample of 3,270 UK adults in 2024 using Prolific. A third of these use CA services such as ChatGPT or Gemini at least once a week. Of these ``regular users'', up to a third exhibited behaviors that may enable attacks, and a fourth have tried jailbreaking (often out of understandable reasons such as curiosity, fun or information seeking). Half state that they sanitize data and most participants report not sharing sensitive data. However, few share very sensitive data such as passwords. The majority are unaware that their data can be used to train models and that they can opt-out. Our findings suggest that current academic threat models manifest in the wild, and mitigations or guidelines for the secure usage of CAs should be developed. In areas critical to security and privacy, CAs must be equipped with effective AI guardrails to prevent, for example, revealing sensitive information to curious employees. Vendors need to increase efforts to prevent the entry of sensitive data, and to create transparency with regard to data usage policies and settings.

翻译：近年来，大语言模型（LLMs）的性能提升推动了基于人工智能的对话代理（CAs）在日常生活中的广泛应用。与此同时，LLMs面临一系列安全威胁，包括越狱攻击以及特定输入可能导致的远程代码执行等问题。因此，用户可能无意中引入风险，例如上传恶意文件或泄露敏感信息。然而，此类用户行为的发生频率及其可能助长攻击的程度仍不明确。为探究这一问题，我们于2024年通过Prolific平台对3,270名英国成年人进行了代表性抽样调查。其中三分之一受访者每周至少使用一次ChatGPT或Gemini等CA服务。在这些“常规用户”中，高达三分之一表现出可能诱发攻击的行为，四分之一曾尝试越狱操作（通常出于可理解的原因，如好奇心、娱乐性或信息获取需求）。半数用户声称会对数据进行脱敏处理，多数参与者表示不会共享敏感数据，但仍有少数用户会分享密码等高度敏感信息。大多数用户未意识到其数据可能被用于模型训练，且不了解可行使退出权。研究结果表明，当前学术界的威胁模型已在现实场景中显现，亟需制定CA安全使用指南与缓解措施。在安全与隐私关键领域，CA必须配备有效的AI防护机制，例如防止好奇员工获取敏感信息。供应商需加强防止敏感数据输入的技术措施，并就数据使用政策与设置建立透明化机制。