Large language models (LLMs) are increasingly used as personal agents, accessing sensitive user data such as calendars, emails, and medical records. Users currently face a trade-off: They can send private records, many of which are stored in remote databases, to powerful but untrusted LLM providers, increasing their exposure risk. Alternatively, they can run less powerful models locally on trusted devices. We bridge this gap. Our Socratic Chain-of-Thought Reasoning first sends a generic, non-private user query to a powerful, untrusted LLM, which generates a Chain-of-Thought (CoT) prompt and detailed sub-queries without accessing user data. Next, we embed these sub-queries and perform encrypted sub-second semantic search using our Homomorphically Encrypted Vector Database across one million entries of a single user's private data. This represents a realistic scale of personal documents, emails, and records accumulated over years of digital activity. Finally, we feed the CoT prompt and the decrypted records to a local language model and generate the final response. On the LoCoMo long-context QA benchmark, our hybrid framework, combining GPT-4o with a local Llama-3.2-1B model, outperforms using GPT-4o alone by up to 7.1 percentage points. This demonstrates a first step toward systems where tasks are decomposed and split between untrusted strong LLMs and weak local ones, preserving user privacy.
翻译:大语言模型(LLMs)日益被用作个人智能代理,需访问用户的敏感数据,如日历、电子邮件和医疗记录。当前用户面临两难选择:一方面可将大量存储于远程数据库的私人记录发送至功能强大但不可信的LLM提供商,但这会增加数据泄露风险;另一方面,则只能在可信设备上运行能力较弱的本地模型。本文旨在弥合这一鸿沟。我们提出的苏格拉底式链式思维推理方法首先将通用的非隐私用户查询发送至强大的不可信LLM,该模型在不接触用户数据的情况下生成链式思维提示及详细子查询。随后,我们嵌入这些子查询,并利用同态加密向量数据库在单用户百万条规模的私人数据(涵盖多年积累的个人文档、邮件与记录,符合真实数字活动规模)中进行加密亚秒级语义检索。最后,将链式思维提示与解密后的记录输入本地语言模型,生成最终响应。在LoCoMo长上下文问答基准测试中,我们结合GPT-4o与本地Llama-3.2-1B模型的混合框架,相比单独使用GPT-4o性能提升最高达7.1个百分点。这标志着向任务分解并分配至不可信强LLM与可信弱本地模型的系统迈出了第一步,有效保障了用户隐私。