We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given $n$ i.i.d. samples from the distribution $\mathcal{N}\left(\theta,I_d\right)$ (with unknown $\theta$), of which a small fraction has been arbitrarily corrupted. Under the promise that $\|\theta\|_0\le s$, we want to correctly distinguish whether $\|\theta\|_2=0$ or $\|\theta\|_2>\gamma$, for some input parameter $\gamma>0$. We show that any algorithm for this task requires $n=\Omega\left(s\log\frac{ed}{s}\right)$ samples, which is tight up to logarithmic factors. We also extend our results to other common notions of sparsity, namely, $\|\theta\|_q\le s$ for any $0 < q < 2$. In the second observation model that we consider, the data is generated according to a sparse linear regression model, where the covariates are i.i.d. Gaussian and the regression coefficient (signal) is known to be $s$-sparse. Here too we assume that an $\epsilon$-fraction of the data is arbitrarily corrupted. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=\Omega\left(\min(s\log d,{1}/{\gamma^4})\right)$ samples. Our results show that the complexity of testing in these two settings significantly increases under robustness constraints. This is in line with the recent observations made in robust mean testing and robust covariance testing.
翻译:我们考虑在两种不同的观测模型下强力测试高维分散信号矢量标准的问题。 在第一个模型中, 我们从分配量 $\mathcal{N\\\left(theta, I_d\right)$( 未知美元) 中获得一小部分被任意腐蚀的样本。 根据 $>>%0\\le s$的保证, 我们想要正确区分 $>theta ⁇ 2=0=0 or$@theta ⁇ 2}gamma$, 对于某些输入参数 $\gamma>0 。 我们显示, 任何计算量的计算都需要 $Omega\left( s\log\fleft) $( left) (未知 $\\\\\\\\\\ d\\\\\\\\\\\\\\right) 美元样本的 。 。 在第二个模型中, $_thetaqleareal deal deal deal deal degrestal roduction rodustration 中, 数据显示的是, 我们real deal degregal degal ral degregregresm 。