Given $m$ $d$-dimensional responsors and $n$ $d$-dimensional predictors, sparse regression finds at most $k$ predictors for each responsor for linear approximation, $1\leq k \leq d-1$. The key problem in sparse regression is subset selection, which usually suffers from high computational cost. Recent years, many improved approximate methods of subset selection have been published. However, less attention has been paid on the non-approximate method of subset selection, which is very necessary for many questions in data analysis. Here we consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. Then an efficient non-approximate method of subset selection is proposed in which we do not need to calculate any coefficients in regression equation for candidate predictors. By the proposed method, the computational complexity is reduced from $O(\frac{1}{6}{k^3}+mk^2+mkd)$ to $O(\frac{1}{6}{k^3}+\frac{1}{2}mk^2)$ for each candidate subset in sparse regression. Because the dimension $d$ is generally the number of observations or experiments and large enough, the proposed method can greatly improve the efficiency of non-approximate subset selection.
翻译:以美元为单位, 以美元为单位, 以美元为单位, 以美元为单位, 微缩的预测值为单位, 以美元为单位, 微缩的回归值在每组的预测值为单位, 以1\leqk k\leq d-1美元为单位。 稀缩的回归值的关键问题是子集选择的子集选择, 通常有高昂的计算成本。 近些年来, 许多子集选择的近似方法已经公布。 但是, 对子集选择的非近似方法的关注较少, 这对于数据分析中的许多问题非常必要 。 我们在这里考虑从相关角度看的微缩回归值, 并提出有条件的不协调公式。 然后, 提出一个高效的非近似的子集选择方法, 而在子集选择中我们不需要计算任何回归方程的系数。 根据拟议的方法, 计算复杂度从O( frac{1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\