Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that $\Theta(n^{2/3})$ is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free $O(\sqrt{n})$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
翻译:具有高维稀疏特征的惯性线性土匪是各个领域的实用模型,包括个性化医学和在线广告。我们在数据贫乏的体系中为稀散线性土匪得出了一个全新的 $\ Omega(n ⁇ 2/3}) $-无维微麦克斯($) 的无维迷你麦片) 。 在这种体系中,地平线比环境维度小,而特性矢量则承认了条件良好的勘探分布。 与此相配合的是, 一种接近匹配的上限算法, 显示$\ Theta(n ⁇ 2/3} $是数据贫乏制度中的最佳比率。 其结果补充了数据丰富制度的现有界限, 提供了另一个需要谨慎平衡信息与遗憾之间的权衡的范例。 最后, 我们证明一个无维值的美元(sqrt{n} $) 在相关特征信号的大小附加假设下, 将美元(sqrt{n} ) 向上层遗憾地捆绑起来。