迈向可证明隐私保护的生成式人工智能使用分析与洞察 (Toward provably private analytics and insights into GenAI use)

Albert Cheu,Artem Lagzdin,Brett McLarnon,Daniel Ramage,Katharine Daly,Marco Gruteser,Peter Kairouz,Rakshita Tandon,Stanislav Chiknavaryan,Timon Van Overveldt,Zoe Gong

Large-scale systems that compute analytics over a fleet of devices must achieve high privacy and security standards while also meeting data quality, usability, and resource efficiency expectations. We present a next-generation federated analytics system that uses Trusted Execution Environments (TEEs) based on technologies like AMD SEV-SNP and Intel TDX to provide verifiable privacy guarantees for all server-side processing. In our system, devices encrypt and upload data, tagging it with a limited set of allowable server-side processing steps. An open source, TEE-hosted key management service guarantees that the data is accessible only to those steps, which are themselves protected by TEE confidentiality and integrity assurance guarantees. The system is designed for flexible workloads, including processing unstructured data with LLMs (for structured summarization) before aggregation into differentially private insights (with automatic parameter tuning). The transparency properties of our system allow any external party to verify that all raw and derived data is processed in TEEs, protecting it from inspection by the system operator, and that differential privacy is applied to all released results. This system has been successfully deployed in production, providing helpful insights into real-world GenAI experiences.

翻译：大规模设备群分析计算系统必须在满足数据质量、可用性和资源效率要求的同时，达到高标准的隐私与安全保障。我们提出一种基于可信执行环境（TEE）技术的下一代联邦分析系统，该系统采用AMD SEV-SNP和Intel TDX等技术，为所有服务器端处理提供可验证的隐私保证。在我们的系统中，设备对数据进行加密上传，并标记允许执行的有限服务器端处理步骤。开源的可信执行环境托管密钥管理服务确保数据仅能被指定步骤访问，这些步骤本身也受到TEE的机密性与完整性保护。本系统支持灵活的工作负载，包括在聚合为差分隐私洞察（含自动参数调优）前，使用大语言模型处理非结构化数据（用于结构化摘要）。系统的透明性特性允许任何外部方验证：所有原始数据与衍生数据均在可信执行环境中处理（避免系统运营方查看），且所有发布结果均应用差分隐私技术。该系统已成功投入实际部署，为真实世界的生成式人工智能使用体验提供了有价值的洞察。