We propose a model-free reinforcement learning architecture, called distributed attentional actor architecture after conditional attention (DA6-X), to provide better interpretability of conditional coordinated behaviors. The underlying principle involves reusing the saliency vector, which represents the conditional states of the environment, such as the global position of agents. Hence, agents with DA6-X flexibility built into their policy exhibit superior performance by considering the additional information in the conditional states during the decision-making process. The effectiveness of the proposed method was experimentally evaluated by comparing it with conventional methods in an objects collection game. By visualizing the attention weights from DA6-X, we confirmed that agents successfully learn situation-dependent coordinated behaviors by correctly identifying various conditional states, leading to improved interpretability of agents along with superior performance.
翻译:我们提出了一种无模型强化学习架构,称为基于条件注意力的分布式注意力演员架构(DA6-X),以提供更好的条件协同行为解释能力。其基本原理涉及重用显著性向量,该向量表示环境的条件状态,例如代理的全局位置。因此,具有DA6-X灵活性的代理在决策过程中考虑条件状态的附加信息,从而表现出卓越的性能。通过在对象收集游戏中将DA6-X的注意力权重可视化,我们确认代理成功学习依赖情况的协同行为,通过正确识别各种条件状态提高代理的可解释性和性能。