We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients, when each client has access to a {\em subset} of arms and each arm yields independent Gaussian observations. The {\em reward} from an arm at any given time is defined as the average of the observations generated at this time across all the clients that have access to the arm. The end goal is to identify the best arm (the arm with the largest mean reward) of each client with the least expected stopping time, subject to an upper bound on the error probability (i.e., the {\em fixed-confidence regime}). We provide a lower bound on the growth rate of the expected time to find the best arm of each client. Furthermore, we show that for any algorithm whose upper bound on the expected time to find the best arms matches with the lower bound up to a multiplicative constant, the ratio of any two consecutive communication time instants must be bounded, a result that is of independent interest. We then provide the first-known lower bound on the expected number of {\em communication rounds} required to find the best arms. We propose a novel algorithm based on the well-known {\em Track-and-Stop} strategy that communicates only at exponential time instants, and derive asymptotic upper bounds on its expected time to find the best arms and the expected number of communication rounds, where the asymptotics is one of vanishing error probabilities.
翻译:当每个客户都有机会接触武器和每只手臂能够产生独立的高斯观察结果时,我们研究在具有中央服务器和多个客户的联邦多武装匪徒环境中,如何在每只手臂中进行最好的手臂识别。在任何特定时间,从一个手臂中得到的奖励是所有能够接触手臂的客户目前所观察到的平均值。最终目标是确定每个客户的最佳手臂(拥有最大平均奖赏的手臂),其停止时间最小,但取决于误差概率的上限(即:所有固定信心制度 ) 。我们对找到每个客户最佳手臂的预期时间增长率的下限。此外,我们表明,对于任何在预期时间找到最佳手臂的上限值与下限一致到倍增常数的客户来说,任何连续两个通信时间的比例都必须被捆绑起来,这是独立的兴趣。我们然后在错误概率上提供一个已知的关于预期通信周期数目(即:即固定信心制度 制度 ) 的首个更低的界限。我们对找到每个客户最佳手臂的最佳手臂的预期增长率提供较低的约束。此外,对于任何在预期的时间,我们提议在最知名的轨道上进行预估定的路径和最接近的军备的顺序。