Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual bandit algorithms applicable to problems with bounded eluder dimension whose regret is nearly identical to their perfectly sequential counterparts -- given access to the same total number of oracle queries -- up to a lower-order ``burn-in" term. We further show these algorithms can be specialized to the class of linear reward functions where we introduce and analyze several new linear bandit algorithms which explicitly introduce diversity into their action selection. Finally, we also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems, to demonstrate the utility of parallelized bandits in practical settings.
翻译:在不确定的情况下,标准决策方法侧重于对决策空间的顺序探索。然而, \ textit{splately} 提出一组决定,利用现有资源进行平行实验,有可能迅速加速探索。 我们提出一套适用于被捆绑的散居者层面问题的(平行)背景强盗算法,其遗憾几乎与其完全相近的相近的对应方相同 -- -- 获得同样数量的竞技查询 -- -- 直至低级的“burn-in”术语。我们进一步表明这些算法可以专门适用于线性奖励功能类别,我们在此类别中引入和分析若干新的线性强盗算法,明确将多样性引入其行动选择中。最后,我们还提出对包括材料发现和生物序列设计问题在内的若干领域中这些平行算法的经验评估,以证明在实际环境中平行强盗的效用。