Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts -- given access to the same total number of oracle queries -- up to a lower-order "burn-in" term that is dependent on the context-set geometry. We provide matching information-theoretic lower bounds on parallel regret performance to establish our algorithms are asymptotically optimal in the time horizon. Finally, we also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems, to demonstrate the utility of parallelized bandits in practical settings.
翻译:在不确定的情况下,标准决策方法侧重于对决定空间的顺序探索。然而,提出一组能够利用现有资源进行平行实验的决定,有可能迅速加速探索。我们提出了一套(平行)相关线性土匪算法,其遗憾几乎与其完全相近的相近对应法相同 -- -- 获得相同总质数的质询 -- -- 直至一个取决于上下文设置的几何的较低级“烧伤”术语。我们为建立我们的算法提供了匹配的平行遗憾表现的信息理论下限,在时间范围上,这种算法是绝对最佳的。最后,我们还对包括材料发现和生物序列设计问题在内的若干领域的这些平行算法进行了实证性评估,以证明在实际环境中平行的土匪的效用。