Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.
翻译:多种基于乐观策略的随机赌博机算法——包括UCB、UCB-V、线性UCB以及有限臂GP-UCB——均能通过对数遗憾界,其证明过程尽管存在表面差异,但本质上遵循相同的结构。本文提炼出这些分析背后的最小要素:仅需对估计量满足一个高概率集中性条件,随后通过两个描述半径收缩与乐观强制偏差的简短确定性引理,即可导出对数遗憾界。该框架为这些经典算法提供了统一且近乎最小化的证明,并能自然地扩展到许多当代赌博机变体。