Differential privacy is a rigorous definition for privacy that guarantees that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this work, we develop new differentially private algorithms to analyze streaming data. Specifically, we consider the problem of estimating the density of a stream of users (or, more generally, elements), which expresses the fraction of all users that actually appear in the stream. We focus on one of the strongest privacy guarantees for the streaming model, namely user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes, on rare occasions, the internal state of the algorithm. Our proposed algorithms employ optimally all the allocated privacy budget, are specially tailored for the streaming model, and, hence, outperform both theoretically and experimentally the conventional sampling-based approach.
翻译:不同的隐私是隐私的严格定义,它保证对敏感数据集进行的任何分析不会泄露任何关于其中所含数据的个人的信息。在这项工作中,我们开发了分析流数据的新而有差别的私人算法。具体地说,我们考虑了估算流用户(或更一般的元素)密度的问题,它反映了流中实际出现的所有用户的分数。我们侧重于流模式最有力的隐私保障之一,即用户级的泛基空基,它确保任何用户的隐私得到保护,甚至不受罕见地观察算法内部状态的对手的保护。我们提议的算法最优化地利用了所有分配的隐私预算,为流模式作了特别的定制,从而超越了传统的取样方法的理论和实验性。