基金会的模型能帮助我们实现完美的保密吗? (Can Foundation Models Help Us Achieve Perfect Secrecy?)

A key promise of machine learning is the ability to assist users with personal tasks. Because the personal context required to make accurate predictions is often sensitive, we require systems that protect privacy. A gold standard privacy-preserving system will satisfy perfect secrecy, meaning that interactions with the system provably reveal no private information. However, privacy and quality appear to be in tension in existing systems for personal tasks. Neural models typically require copious amounts of training to perform well, while individual users typically hold a limited scale of data, so federated learning (FL) systems propose to learn from the aggregate data of multiple users. FL does not provide perfect secrecy, but rather practitioners apply statistical notions of privacy -- i.e., the probability of learning private information about a user should be reasonably low. The strength of the privacy guarantee is governed by privacy parameters. Numerous privacy attacks have been demonstrated on FL systems and it can be challenging to reason about the appropriate privacy parameters for a privacy-sensitive use case. Therefore our work proposes a simple baseline for FL, which both provides the stronger perfect secrecy guarantee and does not require setting any privacy parameters. We initiate the study of when and where an emerging tool in ML -- the in-context learning abilities of recent pretrained models -- can be an effective baseline alongside FL. We find in-context learning is competitive with strong FL baselines on 6 of 7 popular benchmarks from the privacy literature and a real-world case study, which is disjoint from the pretraining data. We release our code here: https://github.com/simran-arora/focus

翻译：机器学习的关键承诺是协助用户完成个人任务的能力。由于准确预测所需的个人背景往往很敏感,我们需要保护隐私的系统。金质标准隐私保护系统将满足完全保密,这意味着与系统的互动可能显示没有私人信息。然而,隐私和质量似乎在现有的个人任务系统中处于紧张状态。神经模型通常需要大量培训才能很好地运行,而单个用户通常拥有有限的数据,因此,联邦学习系统建议从多个用户的综合数据中学习。 FL并不提供完美的保密,而是从业人员应用隐私统计概念 -- -- 也就是说,学习私人用户信息的可能性应相当低。隐私保障的力度受隐私参数制约。许多隐私攻击在FL系统上已经表现出来,对隐私敏感使用案件的适当隐私参数进行解释是困难的。因此,我们的工作为FL提出了一个简单的基线,既能提供更完善的保密保证,又不需要设置任何隐私参数。我们从什么时候和哪里开始研究关于隐私的统计概念概念概念基础,在FL基准中,我们可以学习一个具有竞争力的FL基准。我们在FL数据库中找到一个可靠的基准。