Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require $\Omega(t)$ operations at each round. On the other hand, a different line of research focused on computational efficiency ($\mathcal{O}(1)$ per-round cost), but at the cost of letting go of the aforementioned exponential improvements. Obtaining the best of both world is unfortunately not a matter of marrying both approaches. Instead we introduce a new learning procedure for Logistic Bandits. It yields confidence sets which sufficient statistics can be easily maintained online without sacrificing statistical tightness. Combined with efficient planning mechanisms we design fast algorithms which regret performance still match the problem-dependent lower-bound of Abeille et al. (2021). To the best of our knowledge, those are the first Logistic Bandit algorithms that simultaneously enjoy statistical and computational efficiency.
翻译:最近,后勤匪徒因其综合理论和实践相关性而经过了仔细审查。这一研究努力提供了统计效率高的算法,使以往战略的遗憾因指数性大的因素而得到改进。然而,这种算法成本极高,因为每轮都需要美元(t)美元操作。另一方面,以计算效率为主的不同系列研究($mathcal{O}(1)美元(每轮成本),但以上述指数性改进为代价。遗憾的是,取得两个世界的最好程度并不是两种方法的结合问题。相反,我们为后勤匪徒引入了新的学习程序。它产生了信心组,在不牺牲统计紧紧性的情况下,可以很容易地在网上维持足够的统计数据。我们设计了高效的规划机制,对业绩仍然与阿贝利等人(2021年)的低度问题相匹配感到遗憾。据我们所知,这是第一个同时享有统计和计算效率的后勤大宗算法。