We study the square root bottleneck in the recovery of sparse vectors from quadratic equations. It is acknowledged that a sparse vector $ \mathbf x_0\in \mathbb{R}^n$, $\| \mathbf x_0\|_0 = k$ can in theory be recovered from as few as $O(k)$ generic quadratic equations but no polynomial time algorithm is known for this task unless $m = \Omega(k^2)$. This bottleneck was in fact shown in previous work to be essentially related to the initialization of descent algorithms. Starting such algorithms sufficiently close to the planted signal is known to imply convergence to this signal. In this paper, we show that as soon as $m\gtrsim \mu_0^{-2}k \vee \mu_0^{-4}$ (up to log factors) where $\mu_0 = \| \mathbf x_0\|_\infty/\| \mathbf x_0\|_2$, it is possible to recover a $k$-sparse vector $ \mathbf x_0\in \mathbb{R}^n$ from $m$ quadratic equations of the form $\langle \mathbf A_i, \mathbf x \mathbf x^\intercal\rangle = \langle \mathbf A_i, \mathbf x_0 \mathbf x_0^\intercal\rangle + \varepsilon_i $ by minimizing the classical empirical loss. The proof idea carries over to the phase retrieval setting for which it provides an original initialization that matches the current optimal sample complexity (see e.g. [Cai 2023]). In the maximally incoherent regime $\mu_0^{-2}=k$, and for $m=o(k^2)$ we provide evidence for topological hardness by showing that a property known as the Overlap Gap Property (OGP), which originated in spin glass theory and is conjectured to be indicative of algorithmic intractability when optimizing over random structures, holds for a particular level of overparametrization. The key ingredient of the proof is a lower bound on the tail of chi-squared random variables which follows from the theory of moderate deviations.
翻译:暂无翻译