Reinforcement learning (RL) relies heavily on exploration to learn from its environment and maximize observed rewards. Therefore, it is essential to design a reward function that guarantees optimal learning from the received experience. Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task. However, there is limited work on how to expand logic-based reward shaping to Multi-Agent Reinforcement Learning (MARL). The environment will need to consider the joint state in order to keep track of other agents if the task requires cooperation, thus suffering from the curse of dimensionality with respect to the number of agents. This project explores how logic-based reward shaping for MARL can be designed for different scenarios and tasks. We present a novel method for semi-centralized logic-based MARL reward shaping that is scalable in the number of agents and evaluate it in multiple scenarios.
翻译:强化学习(RL)严重依赖探索,以便从环境中学习,并最大限度地获得观察到的奖励。因此,必须设计一种奖励功能,保证从获得的经验中获得最佳的学习。以前的工作结合了自动和逻辑的奖赏与环境假设,以提供一个自动机制,根据任务综合奖励功能;然而,关于如何将基于逻辑的奖赏形成扩大到多代理强化学习(MARL)的工作很有限。如果任务需要合作,环境需要考虑联合国家,以便跟踪其他代理人,从而在代理人数量方面遭受多元化的诅咒。这个项目探索如何为MARL设计基于逻辑的奖赏,以不同的情景和任务来设计。我们为半集中的、基于逻辑的MARL奖赏形成一种新颖的方法,在代理人数量上可以伸缩,并在多种情况下加以评估。