We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{\beta, f}~\widehat{\mathbb{E}}[L(Y, f(\beta^{1/q} \odot X)] + \lambda_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~\beta \ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel $k_q(x, x') = h(\|x-x'\|_q^q)$, where $\|\cdot\|_q$ is the $\ell_q$ norm. Using gradient descent to optimize this objective with respect to $\beta$ leads to exactly sparse stationary points with high probability. The sparsity is achieved without using any of the well-known explicit sparsification techniques such as penalization (e.g., $\ell_1$), early stopping or post-processing (e.g., clipping). As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
翻译:我们描述一个基于对内核家庭最小化的隐含的摄制机制:\ begin{equation}\ begin{equation}\\ min{beta}\\ mathal{E}[(Y, f(Beta ⁇ 1/q}\odot X)]+\ lambda_n ⁇ f ⁇ mathcal{H ⁇ q ⁇ 2 ⁇ text{unit {unible\ elbeta\ge 0,\end{equation}}}(美元是损失的单位),$odot$是协调的乘法,$\mathcal{Hqqq$是基于内核$k_q(x,x)=h(xxxxx'xqq ⁇ qq)$, 其中$cdocdoqt ⁇ q$是标准。使用梯度的精度来优化这个目标,因为$\beta$(betate)导致精确的固定点,概率很高。