We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem.
翻译:我们研究多任务强化学习(RL)的表格缩略语 Markov 决策程序(MDPs ) 。 我们设计了一个多功能的多玩家RL问题,其中一组玩家同时面临相似但不一定相同的 MDPs,目的是通过玩家之间的信息共享来改善他们的集体表现。 我们设计并分析基于模型传输理念的算法,并提供与差距相关且与差距独立的上下限,从而决定了问题内在的复杂性。