We study the problem of exact support recovery for high-dimensional sparse linear regression when the signals are weak, rare and possibly heterogeneous. Specifically, we fix the minimum signal magnitude at the information-theoretic optimal rate and investigate the asymptotic selection accuracy of best subset selection (BSS) and marginal screening (MS) procedures under independent Gaussian design. Despite of the ideal setup, somewhat surprisingly, marginal screening can fail to achieve exact recovery with probability converging to one in the presence of heterogeneous signals, whereas BSS enjoys model consistency whenever the minimum signal strength is above the information-theoretic threshold. To mitigate the computational issue of BSS, we also propose a surrogate two-stage algorithm called ETS (Estimate Then Screen) based on iterative hard thresholding and gradient coordinate screening, and we show that ETS shares exactly the same asymptotic optimality in terms of exact recovery as BSS. Finally, we present a simulation study comparing ETS with LASSO and marginal screening. The numerical results echo with our asymptotic theory even for realistic values of the sample size, dimension and sparsity.
翻译:具体地说,我们用信息理论最佳率确定最低信号量,并调查独立高森设计下最佳子集选择(BSS)和边际筛选(MS)程序的无症状选择精确度。尽管设计理想,但有些令人惊讶的是,边际筛选可能无法实现精确恢复,在有异差信号的情况下,概率可能与光谱信号相融合,而BSS在最小信号强度超过信息理论阈值时享有模范一致性。为了减轻BSS的计算问题,我们还提议以迭接硬阈值和梯度协调筛选为基础,采用称为ETS(Esterimate Theter Secreen)的两阶段代算法(ETS(Ester Then Secreat)),我们表明,在精确恢复方面,ETS与BSS完全相同。最后,我们提出将ETS与LASSO和边际筛选进行比较的模拟研究。数字结果与我们的随机理论相呼应,甚至反映了样本大小、尺寸和广度的现实值。