Local Differential Privacy (LDP) is now widely adopted in large-scale systems to collect and analyze sensitive data while preserving users' privacy. However, almost all LDP protocols rely on a semi-trust model where users are curious-but-honest, which rarely holds in real-world scenarios. Recent works show poor estimation accuracy of many LDP protocols under malicious threat models. Although a few works have proposed some countermeasures to address these attacks, they all require prior knowledge of either the attacking pattern or the poison value distribution, which is impractical as they can be easily evaded by the attackers. In this paper, we adopt a general opportunistic-and-colluding threat model and propose a multi-group Differential Aggregation Protocol (DAP) to improve the accuracy of mean estimation under LDP. Different from all existing works that detect poison values on individual basis, DAP mitigates the overall impact of poison values on the estimated mean. It relies on a new probing mechanism EMF (i.e., Expectation-Maximization Filter) to estimate features of the attackers. In addition to EMF, DAP also consists of two EMF post-processing procedures (EMF* and CEMF*), and a group-wise mean aggregation scheme to optimize the final estimated mean to achieve the smallest variance. Extensive experimental results on both synthetic and real-world datasets demonstrate the superior performance of DAP over state-of-the-art solutions.
翻译:局部差分隐私(Local Differential Privacy,LDP)已广泛应用于大规模系统中,以收集和分析敏感数据,同时保护用户的隐私。然而,几乎所有LDP协议都依赖于一种半信任模型,其中用户是好奇但诚实的,而这在实际场景中很少存在。最近的研究表明,在恶意威胁模型下,许多LDP协议的估计精度很差。虽然一些研究提出了一些对抗这些攻击的对策,但它们都需要先知道攻击模式或毒值分布,这是不切实际的,因为它们很容易被攻击者规避。本文针对一般的机会勾结威胁模型提出了一种多组差分聚合协议(Differential Aggregation Protocol,DAP),来改善在LDP下均值估计的精度。与所有现有的检测毒值的研究不同,DAP减轻了毒值对估计均值的总体影响。它依赖于一种新的探测机制——期望极大滤波器(Expectation-Maximization Filter,EMF)来估计攻击者的特征。除了EMF外,DAP还包括两个EMF后处理过程(EMF*和CEMF*),以及一种组内均值聚合方案,以优化最终的估计均值,使方差最小化。对合成和真实世界的数据集进行广泛的实验结果表明,DAP在表现上优于现有的最先进的解决方案。