Online learning has been successfully applied to many problems in which data are revealed over time. In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, at both the agent and network levels. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds.
翻译:在线学习被成功地应用于长期披露数据的许多问题。 在本文中,我们为在出现延误和不同步的情况下研究多试剂在线学习问题提供了一个总体框架。 具体地说,我们建议和分析一类适应性双平均计划,其中代理机构只需积累从整个系统收到的梯度反馈,而不需要任何代理机构之间的协调。 在单一代理机构的情况下,拟议方法的适应性使我们能够将一系列现有结果扩大到在采取行动和接收相应反馈之间可能出现无限制拖延的问题。在多试剂案例中,情况要复杂得多,因为代理机构可能无法使用全球时钟作为参照点;要克服这一点,我们侧重于可用于进行每项预测的信息,而不是与每项反馈相关的实际延误。这使我们能够在代理机构和网络层面以最适当的遗憾界限来得出适应性学习战略。 最后,我们还分析了拟议算法的“乐观性”变式,该算法能够利用问题的可预测性,但变化较慢,并导致改进遗憾界限。