The stochastic $K$-armed bandit problem has been studied extensively due to its applications in various domains ranging from online advertising to clinical trials. In practice however, the number of arms can be very large resulting in large memory requirements for simultaneously processing them. In this paper we consider a streaming setting where the arms are presented in a stream and the algorithm uses limited memory to process these arms. Here, the goal is not only to minimize regret, but also to do so in minimal memory. Previous algorithms for this problem operate in one of the two settings: they either use $\Omega(\log \log T)$ passes over the stream (Rathod, 2021; Chaudhuri and Kalyanakrishnan, 2020; Liau et al., 2018), or just a single pass (Maiti et al., 2021). In this paper we study the trade-off between memory and regret when $B$ passes over the stream are allowed, for any $B \geq 1$, and establish tight regret upper and lower bounds for any $B$-pass algorithm. Our results uncover a surprising *sharp transition phenomenon*: $O(1)$ memory is sufficient to achieve $\widetilde\Theta\Big(T^{\frac{1}{2} + \frac{1}{2^{B+2}-2}}\Big)$ regret in $B$ passes, and increasing the memory to any quantity that is $o(K)$ has almost no impact on further reducing this regret, unless we use $\Omega(K)$ memory. Our main technical contribution is our lower bound which requires the use of information-theoretic techniques as well as ideas from round elimination to show that the *residual problem* remains challenging over subsequent passes.
翻译:由于在从在线广告到临床试验等不同领域的应用,对武装匪徒问题进行了广泛的研究。但在实践中,武器数量可能非常大,导致同时处理武器需要大量记忆。在本文件中,我们考虑的是武器在流中展示的流状环境,算法使用有限的记忆处理这些武器。在这里,我们的目标不仅是尽量减少遗憾,而且是在最小的记忆中这样做。这个问题的前算法在两种环境中之一运作:它们要么使用美元\Omega(log T),在流中传递(Rathod,2021年;Chaudhuri和Kalyanakrishnan,2020年;Liau等人,2018年),或者仅仅使用一个流流式环境(Maiti等人,2021年)。 在本文件中,我们不仅研究记忆和遗憾之间的交易,对于任何B2美元,而且对于任何美元进入流的算法来说,它们都具有强烈的遗憾。 我们发现一个令人惊讶的内存和内存流技术需要继续使用。