Covering option discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. Thus, existing researches on adopting options in multi-agent scenarios still rely on single-agent option discovery and fail to directly discover the joint options that can improve the connectivity of the joint state space of agents. In this paper, we show that it is indeed possible to directly compute multi-agent options with collaborative exploratory behaviors among the agents, while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph -- the Kronecker product of individual agents' state transition graphs, based on which we can directly estimate the Fiedler vector of the joint state space using the Laplacian spectrum of individual agents' transition graphs. This decomposition enables us to efficiently construct multi-agent joint options by encouraging agents to connect the sub-goal joint states which are corresponding to the minimum or maximum values of the estimated joint Fiedler vector. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher cumulative rewards.
翻译:开发了涵盖选项的发现,以改善在单一试剂情景下,以微弱的奖励信号,在单一试剂情景下进行强化学习的探索,方法是将国家过渡图Fiedler矢量提供的嵌入空间中的最远国家连接起来。然而,这些选项的发现方法不能直接扩大到多试剂情景,因为随着系统代理器数目的增加,联合状态空间会随着系统代理器数量的激增而成倍增长。因此,关于多试剂情景中采用选项的现有研究仍然依靠单一试剂选项的发现,并且未能直接发现能够改善联合州代理商空间连接的联合选项。在本文中,我们确实可以直接将多试剂选项与代理商之间的协作探索行为直接进行计算,同时仍然享有解析的便利。我们的主要想法是,将联合状态空间作为Kronecker图形(单个代理商状态图中的Kronecker产品)接近于多试剂选项。 我们可以直接估算联合状态空间的Fiedler矢量,使用单个代理商更快的过渡图谱。这种解使我们能够有效地构建多试剂联合选项,通过鼓励代理商将最佳的先期量或多式组合进行。