Given a matrix $A \in \mathbb{R}^{m\times d}$ with singular values $\sigma_1\geq \cdots \geq \sigma_d$, and a random matrix $G \in \mathbb{R}^{m\times d}$ with iid $N(0,T)$ entries for some $T>0$, we derive new bounds on the Frobenius distance between subspaces spanned by the top-$k$ (right) singular vectors of $A$ and $A+G$. This problem arises in numerous applications in statistics where a data matrix may be corrupted by Gaussian noise, and in the analysis of the Gaussian mechanism in differential privacy, where Gaussian noise is added to data to preserve private information. We show that, for matrices $A$ where the gaps in the top-$k$ singular values are roughly $\Omega(\sigma_k-\sigma_{k+1})$ the expected Frobenius distance between the subspaces is $\tilde{O}(\frac{\sqrt{d}}{\sigma_k-\sigma_{k+1}} \times \sqrt{T})$, improving on previous bounds by a factor of $\frac{\sqrt{m}}{\sqrt{d}} \sqrt{k}$. To obtain our bounds we view the perturbation to the singular vectors as a diffusion process -- the Dyson-Bessel process -- and use tools from stochastic calculus to track the evolution of the subspace spanned by the top-$k$ singular vectors.
翻译:暂无翻译