Imagine a group of citizens willing to collectively contribute their personal data for the common good to produce socially useful information, resulting from data analytics or machine learning computations. Sharing raw personal data with a centralized server performing the computation could raise concerns about privacy and a perceived risk of mass surveillance. Instead, citizens may trust each other and their own devices to engage into a decentralized computation to collaboratively produce an aggregate data release to be shared. In the context of secure computing nodes exchanging messages over secure channels at runtime, a key security issue is to protect against external attackers observing the traffic, whose dependence on data may reveal personal information. Existing solutions are designed for the cloud setting, with the goal of hiding all properties of the underlying dataset, and do not address the specific privacy and efficiency challenges that arise in the above context. In this paper, we define a general execution model to control the data-dependence of communications in user-side decentralized computations, in which differential privacy guarantees for communication patterns in global execution plans can be analyzed by combining guarantees obtained on local clusters of nodes. We propose a set of algorithms which allow to trade-off between privacy, utility and efficiency. Our formal privacy guarantees leverage and extend recent results on privacy amplification by shuffling. We illustrate the usefulness of our proposal on two representative examples of decentralized execution plans with data-dependent communications.
翻译:想象一个愿意集体贡献个人数据的公民群体,为共同利益贡献个人数据,通过数据分析或机器学习计算,产生对社会有用的信息。 与中央服务器共享原始个人数据,进行计算,可能会引起对隐私和大规模监控风险的担忧。相反,公民可以信任彼此和他们自己的设备,进行分散计算,以便共同生成需要共享的汇总数据。在安全计算节点,通过运行时的安全渠道交换信息的背景下,一个关键的安全问题是防止外部袭击者观察交通,因为外部袭击者对数据的依赖可能暴露个人信息。现有解决方案是为云层设置设计的,目的是隐藏基本数据集的所有属性,而不是解决上述背景下出现的具体隐私和效率挑战。我们在本文件中界定了一种控制用户-端分散计算中通信依赖性的一般执行模式,通过将当地节点获得的保障结合起来,可以分析全球执行计划中通信模式的不同隐私保障。我们提出了一套可以进行隐私、效用和效率交易的算法,目的是隐藏基本数据组的所有属性,而不是解决上述背景下出现的具体隐私和效率问题。我们提出的正式的隐私保障和扩展数据使用率,我们用最近关于保密性数据格式的推举。