We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bounds of Li et al. (2017) by leveraging the self-concordance of the logistic loss inspired by Faury et al. (2020). Specifically, our confidence width does not scale with the problem dependent parameter $1/\kappa$, where $\kappa$ is the worst-case variance of an arm reward. At worse, $\kappa$ scales exponentially with the norm of the unknown linear parameter $\theta^*$. Instead, our bound scales directly on the local variance induced by $\theta^*$. We present two applications of our novel bounds on two logistic bandit problems: regret minimization and pure exploration. Our analysis shows that the new confidence bounds improve upon previous state-of-the-art performance guarantees.
翻译:我们建议改善线性后勤模式的固定设计信任界限。 我们的界限通过利用Faury等人(202020年)启发的后勤损失的自我协调,大大改善了Li等人(2017年)的最新界限。 具体地说,我们的信心宽度与问题的依附参数1/\kappa美元($\kappa$是武器奖赏的最差情况)。 更糟糕的是,与未知线性参数的规范 $\theta ⁇ $($)相比,我们的界限大大改善。 相反,我们的约束尺度直接取决于由$\theta ⁇ $($)引起的当地差异。 我们在两个后勤强盗问题上提出了我们的新颖界限的两种应用:最小化和纯度勘探。我们的分析表明,新的信任界限在以前的最先进的履约保证下得到了改善。