节点差分隐私估计连接组件数量 (Node-Differentially Private Estimation of the Number of Connected Components)

We design the first node-differentially private algorithm for approximating the number of connected components in a graph. Given a database representing an $n$-vertex graph $G$ and a privacy parameter $\varepsilon$, our algorithm runs in polynomial time and, with probability $1-o(1)$, has additive error $\widetilde{O}(\frac{\Delta^*\ln\ln n}{\varepsilon}),$ where $\Delta^*$ is the smallest possible maximum degree of a spanning forest of $G.$ Node-differentially private algorithms are known only for a small number of database analysis tasks. A major obstacle for designing such an algorithm for the number of connected components is that this graph statistic is not robust to adding one node with arbitrary connections (a change that node-differential privacy is designed to hide): every graph is a neighbor of a connected graph. We overcome this by designing a family of efficiently computable Lipschitz extensions of the number of connected components or, equivalently, the size of a spanning forest. The construction of the extensions, which is at the core of our algorithm, is based on the forest polytope of $G.$ We prove several combinatorial facts about spanning forests, in particular, that a graph with no induced $\Delta$-stars has a spanning forest of degree at most $\Delta$. With this fact, we show that our Lipschitz extensions for the number of connected components equal the true value of the function for the largest possible monotone families of graphs. More generally, on all monotone sets of graphs, the $\ell_\infty$ error of our Lipschitz extensions is nearly optimal.

翻译：我们设计了第一个节点差分隐私算法，用于近似估计图中连接组件的数量。给定一个表示$n$个顶点的图$G$的数据库和隐私参数$\varepsilon$，我们的算法在多项式时间内运行，并且，概率为$1-o(1)$，具有$\widetilde{O}(\frac{\Delta^* \ln\ln n}{\varepsilon})$的加法误差，其中$\Delta^*$是$G$的最小可能的生成森林最大度。已知只有少数数据库分析任务可以设计节点差分隐私算法以实现保护。设计这样的用于连接组件数量的算法的主要障碍是该图统计量对于任意连接的一个节点（节点差分隐私的设计目的是隐藏这一变化）都不具有鲁棒性：每个图都是连通图的邻居。我们通过设计一组有效计算的连接组件数量或等效地，生成森林大小的Lipschitz扩展来克服这个障碍。核心算法的扩展构造基于$G$的森林多面体。我们证明了关于生成森林的几个组合事实，特别是，没有诱导$\Delta$-星形的图具有最多度为$\Delta$的生成森林。通过这一事实，我们展示了用于连接组件数量的Lipschitz扩展对于最大可能单调图族具有相等的扩展值。更一般地，在所有单调图族上，我们的Lipschitz扩展的$\ell_\infty$误差几乎是最优的。