We give an $(\varepsilon,\delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a\in [k]:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
翻译:我们给出了美元( varepsilon,\ delta) 美元, 不同私人的算法, 并给出了在洗牌模型中多武装土匪问题( MAB) 的配发( MAB), 并附有基于分配的遗憾 $Oleft( left) (\\\ sum ⁇ a\ a\ in [ k]:\ Delta_ a> 0\\\\\\ frac\ log T\ k\ t\ k\ t\ log\ frac{ 1\\\\ delta ⁇ log Tunvarepsilon ⁇ right) $, 其中$T是弹数, $\ delta_ a 是手臂的亚最佳差距 $, $( $) 是武器的总数 。 我们的上层几乎匹配了 最著名的中央模型的已知算法的遗憾, 并且大大超越了本地最已知的算法 。