Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most $\zeta_{min}^2$, where $\zeta_{min}$ is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.
翻译:在连续决策环境中,最近关于土匪问题的著作采纳了拉索趋同理论。即使有完全观察到的背景,也存在技术挑战,阻碍了现有拉索趋同理论的应用:(1) 证明在条件性亚地(Gausian)噪声下受限制的乙基值条件,(2) 说明上下文变量和所选择的动作之间的依赖性。本文研究了缺失的共变因素对随机线性线性土匪算法的遗憾的影响。我们的工作提供了一种高概率上限,它取决于拟议算法在共变抽样概率方面引起的遗憾,表明由于损失最多为$\zeta ⁇ ⁇ ⁇ min ⁇ 2$($zeta ⁇ ⁇ min ⁇ 2$)而导致的遗憾降低,这是在上下文矢量中观察共变变量的最低概率。我们用顺序选择有区别的DNA探测器来说明实际应用实验设计收集基因表达数据的方法的算法。