This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model $y = \Pi_* X \beta_* + w$, where $X$ is an $n \times d$ standard Gaussian design matrix, $w$ is Gaussian noise with entrywise variance $\sigma^2$, $\Pi_*$ is an unknown $n \times n$ permutation matrix, and $\beta_*$ is the regression coefficient, also unknown. Previous work has shown that, in the large $n$-limit, the minimal signal-to-noise ratio ($\mathsf{SNR}$), $\lVert \beta_* \rVert^2/\sigma^2$, for recovering the unknown permutation exactly with high probability is between $n^2$ and $n^C$ for some absolute constant $C$ and the sharp threshold is unknown even for $d=1$. We show that this threshold is precisely $\mathsf{SNR} = n^4$ for exact recovery throughout the sublinear regime $d=o(n)$. As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be $\mathsf{SNR} = n^2$, where all but a vanishing fraction of the permutation is reconstructed.
翻译:暂无翻译