We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on $1/\kappa$, where $\kappa$ is the minimal variance over all arms' reward distributions. In general, $1/\kappa$ scales exponentially with the norm of the unknown linear parameter $\theta^*$. Instead of relying on this worst-case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm's reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration, we also provide a lower bound highlighting a dependence on $1/\kappa$ for a family of instances.
翻译:我们建议改善线性后勤模式的固定设计信任度。我们通过最近对后勤损失进行自我协调分析(Foury等人,2020年),大大改进了Li等人(2017年)所约束的最新技术水平(2017年),具体地说,我们的信任度避免直接依赖1美元/卡帕(Kappa)美元,因为Kappa美元是所有军备奖励分配的最小差异。一般而言,1美元/卡帕(Kappa)美元与未知线性参数的规范($\theta ⁇ $)成倍增长。我们对任何特定手臂的奖赏所约束的信任直接取决于该手臂报酬分配的差异。我们提出了我们两个新的界限,以纯粹勘探为目的,并遗憾最大限度地减少利用最先进的业绩保证的后勤匪徒。关于纯度的勘探,我们还提供了一个家庭对1美元/卡帕($)的依赖度较低。