Multi-agent reinforcement learning (MARL) recently has achieved tremendous success in a wide range of fields. However, with a black-box neural network architecture, existing MARL methods make decisions in an opaque fashion that hinders humans from understanding the learned knowledge and how input observations influence decisions. Our solution is MIXing Recurrent soft decision Trees (MIXRTs), a novel interpretable architecture that can represent explicit decision processes via the root-to-leaf path of decision trees. We introduce a novel recurrent structure in soft decision trees to address partial observability, and estimate joint action values via linearly mixing outputs of recurrent trees based on local observations only. Theoretical analysis shows that MIXRTs guarantees the structural constraint with additivity and monotonicity in factorization. We evaluate MIXRTs on a range of challenging StarCraft II tasks. Experimental results show that our interpretable learning framework obtains competitive performance compared to widely investigated baselines, and delivers more straightforward explanations and domain knowledge of the decision processes.
翻译:多剂强化学习(MARL)最近在许多领域取得了巨大成功,然而,随着黑盒神经网络结构的建立,现有MARL方法以不透明的方式作出决定,阻碍人类理解所学知识和投入观察如何影响决策。我们的解决办法是混合经常软决定树(MIXRTs),这是一个新的解释性架构,可以通过决策树的根对叶路径代表明确的决策过程。我们在软决策树中引入了一个新的经常性结构,以解决部分可耐性,并通过线性混合仅以当地观察为基础的经常树产出来估计联合行动值。理论分析表明,MIXRTs保证结构限制,在因素化中具有增加性和单一性。我们评估一系列具有挑战性的StarCraft II任务 MIXRTs。实验结果显示,我们的可解释性学习框架与广泛调查的基线相比,取得了竞争性的绩效,并提供了更直接的解释和对决策过程的域知识。