In this preliminary (and unpolished) version of the paper, we study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. Some feedback is then revealed to these agents and is later propagated through the network. We consider the case of full, bandit, and semi-bandit feedback. In particular, we construct a reduction to delayed single-agent learning that applies to both the full and the bandit feedback case and allows to obtain regret guarantees for both settings. We complement these results with a near-matching lower bound.
翻译:在本文的这个初步(和未粉碎的)版本中,我们研究一个与代理商网络不同步的在线学习设置。 在每一个步骤中,有些代理商被激活,被要求作出预测,并支付相应的损失。 然后向这些代理商披露一些反馈,然后通过网络传播。 我们考虑完整、强盗和半强盗反馈的案例。 特别是, 我们构建一个适用于完整和强盗反馈案例的延迟的单一代理商学习的缩写, 并允许为两种场合获得遗憾保证。 我们用一个近乎匹配的较低约束来补充这些结果。