We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents operating in the same Markov decision process, however possibly with different reward functions, we consider the regret each agent suffers with respect to an optimal policy maximizing her average reward. We show that when the agents share their observations the total regret of all agents is smaller by a factor of $\sqrt{\aleph}$ compared to the case when each agent has to rely on the information collected by herself. This result demonstrates how considering the regret in multi-agent settings can provide theoretical bounds on the benefit of sharing observations in transfer learning.
翻译:我们提出了一个方法,通过多试剂环境的遗憾界限,量化在强化学习中转让的效用;考虑到在同一个马尔科夫决定程序中运作的一些物剂,尽管可能具有不同的奖赏功能,我们认为每个物剂在最佳政策方面都感到遗憾,因为最佳政策使平均奖赏最大化;我们表明,当物剂分享其观察结果时,所有物剂的全部遗憾比每个物剂必须依靠自己收集的信息的情况低一个百分点。这一结果表明,在多物剂环境中考虑这种遗憾如何提供理论界限,说明在转让学习中分享观察结果的好处。