We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph. We characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network. Our analysis recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback. A more detailed version of our results also captures the dependence of the regret on the delay caused by the time the information takes to traverse each graph. Experiments run on synthetic data show that the empirical behavior of our algorithm is consistent with the theoretical results.
翻译:在合作在线学习环境中,我们研究反馈和交流之间的相互作用。 在一个合作在线学习环境中,一个代理网络解决了学习者反馈由任意图表决定的任务。 我们对于反馈图表和通信网络之间强项产品的独立性表示遗憾。 我们的分析发现,许多先前已知的网上学习范围都是特殊情况,需要专家或强盗反馈。 我们的结果的更详细版本还反映了对信息在每一图表中转动时造成的延误的遗憾的依赖性。 合成数据实验显示,我们算法的经验行为与理论结果是一致的。