We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $\mathbf{x}$ and decision $\mathbf{y}$. The reward function to learn, $f(\mathbf{x},\mathbf{y})$, does not have a particular parametric form. The literature has shown that the optimal regret is $\tilde{O}(T^{(d_x+d_y+1)/(d_x+d_y+2)})$, where $d_x$ and $d_y$ are the dimensions of $\mathbf x$ and $\mathbf y$, and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is referred to as \textit{sparsity} in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called \textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Our algorithm achieves the regret $\tilde{O}(T^{(d_x^*+d_y+1)/(d_x^*+d_y+2)})$, where $d_x^*$ is the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings.
翻译:我们考虑到一个背景在线学习(多武装土匪)问题, 问题在于高维的 Covariate $\ mathbf{x} 美元和决定 $\ mathbf{y} 美元。 学习的奖赏函数, $f( mathbf{x},\ mathbf{y} 美元, 没有特定的参数格式。 文献显示, 最佳遗憾是 $\\\\ (d_ x+d_y+1) / (d_ x+_d_y+2} 美元 美元, 其中$_ xxx 美元和美元是 mathbf{x} 的维度。 学习的奖赏函数, $(mathbbbffff{x}, xxxx) 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 等