Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include $n$ honest and $m$ malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.
翻译:最近的工作表明,面临独立情况下的持械盗匪的代理人可以合作减少遗憾;然而,这些工程假设,每个代理人总是向其他代理人推荐他们的个人最佳武器估计数,这在设想的应用中是不切实际的(分散计算或社会建议系统中的垃圾邮件中的机器故障)。因此,我们概括地将这一设置包括了分别建议最佳武器估计数和任意武器的诚实和美元恶意代理人。我们首先表明,即使有单一的恶意代理人,现有的基于合作的算法也未能改善对单一代理人基线的遗憾保证。我们提议了一个计划,让诚实的代理人了解谁恶意和动态地减少与(即“阻塞”)他们)的沟通。我们表明,合作确实减少了这种算法的遗憾,假设美元比美元小,但没有恶意代理人行为假设,从而确保我们的算法能够抵御任何恶意建议战略。