基于矩阵相干性的紧致差分隐私主成分分析 (Tight Differentially Private PCA via Matrix Coherence)

We revisit the task of computing the span of the top $r$ singular vectors $u_1, \ldots, u_r$ of a matrix under differential privacy. We show that a simple and efficient algorithm -- based on singular value decomposition and standard perturbation mechanisms -- returns a private rank-$r$ approximation whose error depends only on the \emph{rank-$r$ coherence} of $u_1, \ldots, u_r$ and the spectral gap $\sigma_r - \sigma_{r+1}$. This resolves a question posed by Hardt and Roth~\cite{hardt2013beyond}. Our estimator outperforms the state of the art -- significantly so in some regimes. In particular, we show that in the dense setting, it achieves the same guarantees for single-spike PCA in the Wishart model as those attained by optimal non-private algorithms, whereas prior private algorithms failed to do so. In addition, we prove that (rank-$r$) coherence does not increase under Gaussian perturbations. This implies that any estimator based on the Gaussian mechanism -- including ours -- preserves the coherence of the input. We conjecture that similar behavior holds for other structured models, including planted problems in graphs. We also explore applications of coherence to graph problems. In particular, we present a differentially private algorithm for Max-Cut and other constraint satisfaction problems under low coherence assumptions.

翻译：我们重新审视在差分隐私约束下计算矩阵前 $r$ 个奇异向量 $u_1, \\ldots, u_r$ 张成空间的任务。我们证明，一种基于奇异值分解和标准扰动机制的简单高效算法能够返回一个私有秩-$r$ 近似，其误差仅取决于 $u_1, \\ldots, u_r$ 的\\emph{秩-$r$ 相干性}以及谱间隙 $\\sigma_r - \\sigma_{r+1}$。这解决了 Hardt 和 Roth~\\cite{hardt2013beyond} 提出的问题。我们的估计器优于现有技术——在某些机制中显著更优。具体而言，我们表明在稠密设置下，该算法在 Wishart 模型中为单峰主成分分析实现了与最优非私有算法相同的保证，而先前的私有算法未能做到这一点。此外，我们证明（秩-$r$）相干性在高斯扰动下不会增加。这意味着任何基于高斯机制的估计器——包括我们的方法——都能保持输入的相干性。我们推测类似行为也适用于其他结构化模型，包括图中的植入问题。我们还探讨了相干性在图问题中的应用。特别地，我们提出了一种在低相干性假设下用于最大割问题及其他约束满足问题的差分隐私算法。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日