小反向偏差小的粗乱草图 (Sparse sketches with small inversion bias)

For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently constructed estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices. We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, then after simple rescaling, the estimator $(\frac m{m-d}\tilde A^\top\tilde A)^{-1}$ is $(\epsilon,\delta)$-unbiased for $(A^\top A)^{-1}$ with a sketch of size $m=O(d+\sqrt d/\epsilon)$. This implies that for $m=O(d)$, the inversion bias of this estimator is $O(1/\sqrt d)$, which is much smaller than the $\Theta(1)$ approximation error obtained as a consequence of the subspace embedding guarantee for sub-gaussian sketches. We then propose a new sketching technique, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparse embeddings as well as data-aware leverage-based row sampling methods, to get $\epsilon$ inversion bias for sketch size $m=O(d\log d+\sqrt d/\epsilon)$ in time $O(\text{nnz}(A)\log n+md^2)$, where nnz is the number of non-zeros. The key techniques enabling our analysis include an extension of a classical inequality of Bai and Silverstein for random quadratic forms, which we call the Restricted Bai-Silverstein inequality; and anti-concentration of the Binomial distribution via the Paley-Zygmund inequality, which we use to prove a lower bound showing that leverage score sampling sketches generally do not achieve small inversion bias.

翻译：对于一个高额美元美元基质美元和随机美元基质美元基质美元美元美元美元基质美元美元基质美元美元美元基质美元美元基质美元美元美元基质美元美元基质现象我们称之为反向偏差, 在统计和分配优化中出现, 当平均多个独立构建的基质美元美元基质基质美元基质基质基质美元基质美元基质基质数据以美元基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质基质, 基质基质基质基质基质基质基质基质基质基质, 基质基质基质基质基质基质基质基质基质基质基质基质基质基质, 基质基质基质基质基质基质, 基质基质基质基质, 基质基质, 基质基质,,,, 基质,,,, 基质,, 基基, 基质,,,, 基质基质, 基质,, 基质, 基质,,,,,,, 基, 基质, 基质,,, 基质, 基质,, 基基质,, 基