节点差分隐私估计连通分量数量 (Node-Differentially Private Estimation of the Number of Connected Components)

We design the first node-differentially private algorithm for approximating the number of connected components in a graph. Given a database representing an $n$-vertex graph $G$ and a privacy parameter $\varepsilon$, our algorithm runs in polynomial time and, with probability $1-o(1)$, has additive error $\widetilde{O}(\frac{\Delta^*\ln\ln n}{\varepsilon}),$ where $\Delta^*$ is the smallest possible maximum degree of a spanning forest of $G.$ Node-differentially private algorithms are known only for a small number of database analysis tasks. A major obstacle for designing such an algorithm for the number of connected components is that this graph statistic is not robust to adding one node with arbitrary connections (a change that node-differential privacy is designed to hide): {\em every} graph is a neighbor of a connected graph. We overcome this by designing a family of efficiently computable Lipschitz extensions of the number of connected components or, equivalently, the size of a spanning forest. The construction of the extensions, which is at the core of our algorithm, is based on the forest polytope of $G.$ We prove several combinatorial facts about spanning forests, in particular, that a graph with no induced $\Delta$-stars has a spanning forest of degree at most $\Delta$. With this fact, we show that our Lipschitz extensions for the number of connected components equal the true value of the function for the largest possible monotone families of graphs. More generally, on all monotone sets of graphs, the $\ell_\infty$ error of our Lipschitz extensions is nearly optimal.

翻译：我们设计了第一个节点差分隐私算法，用于近似图中连通组件的数量。给定表示$n$个顶点图$G$的数据库和隐私参数$\varepsilon$，我们的算法在多项式时间内运行，并且以$1-o(1)$的概率具有加性误差$\widetilde{O}(\frac{\Delta^*\ln\ln n}{\varepsilon})$，其中$\Delta^*$是$G$的生成森林的最小可能最大度数。节点差分隐私算法仅对少数数据库分析任务已知。为了为连通分量的数量设计这样的算法，一个主要障碍是此图形统计量对添加一个具有任意连接的节点不具有鲁棒性（这是节点差分隐私旨在隐藏的变化）：$每个$图都是连通图的邻居。我们通过设计一组可以高效计算的连续Lipschitz扩展来克服这一点，用于连通分量的数量或等效地，用于生成森林的大小。构建这些扩展，这是我们算法的核心，是基于$G$的森林多面体。我们证明了关于生成森林的若干组合事实，特别是，没有诱导$\Delta$-星的图具有最多$\Delta$的生成森林度。通过这一事实，我们展示了连通分量数量的Lipschitz扩展对于最大可能的单调图族具有相等的真实值。更一般地，在所有单调图集上，我们的Lipschitz扩展的$\ell_\infty$误差几乎是最优的。