Private data analysis faces a significant challenge known as the curse of dimensionality, leading to increased costs. However, many datasets possess an inherent low-dimensional structure. For instance, during optimization via gradient descent, the gradients frequently reside near a low-dimensional subspace. If the low-dimensional structure could be privately identified using a small amount of points, we could avoid paying (in terms of privacy and accuracy) for the high ambient dimension. On the negative side, Dwork, Talwar, Thakurta, and Zhang (STOC 2014) proved that privately estimating subspaces, in general, requires an amount of points that depends on the dimension. But Singhal and Steinke (NeurIPS 2021) bypassed this limitation by considering points that are i.i.d. samples from a Gaussian distribution whose covariance matrix has a certain eigenvalue gap. Yet, it was still left unclear whether we could provide similar upper bounds without distributional assumptions and whether we could prove lower bounds that depend on similar eigenvalue gaps. In this work, we make progress in both directions. We formulate the problem of private subspace estimation under two different types of singular value gaps of the input data and prove new upper and lower bounds for both types. In particular, our results determine what type of gap is sufficient and necessary for estimating a subspace with an amount of points that is independent of the dimension.
翻译:暂无翻译