Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
翻译:紧凑的用户表示(例如嵌入)构成个性化服务的骨干。在这项工作中,我们提出了一个新的理论框架,用于测量这种用户表示中的重新识别风险。基于假设检验的我们的框架,在形式上界定了攻击者能够从表示中获得用户身份的概率。作为应用,我们展示了我们的框架足够通用,可以模拟Chrome的Topics API用于基于兴趣的广告等重要的真实应用。我们通过展示证明良好的攻击算法来补充我们的理论界限,用于重新识别的风险在Topics API中的估计。我们相信,这项工作提供了重新识别风险的严格和可解释的概念,并提供了测量它的框架,可用于指导真实世界的应用。