We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients, when each client has access to a {\em subset} of arms and each arm yields independent Gaussian observations. The goal is to identify the best arm of each client subject to an upper bound on the error probability; here, the best arm is one that has the largest {\em average} value of the means averaged across all clients having access to the arm. Our interest is in the asymptotics as the error probability vanishes. We provide an asymptotic lower bound on the growth rate of the expected stopping time of any algorithm. Furthermore, we show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant ({\em almost-optimal} algorithm), the ratio of any two consecutive communication time instants must be {\em bounded}, a result that is of independent interest. We thereby infer that an algorithm can communicate no more sparsely than at exponential time instants in order to be almost-optimal. For the class of almost-optimal algorithms, we present the first-of-its-kind asymptotic lower bound on the expected number of {\em communication rounds} until stoppage. We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is asymptotically almost-optimal.
翻译:暂无翻译