We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep reinforcement learning policies parameterized by an end-to-end black-box graph neural network, our approach disentangles the decision-making process into two steps. The first step is a simplified classification problem that maps the graph input to an action group where all actions share a similar semantic meaning. The second step implements a sophisticated rule-miner that conducts explicit one-hop reasoning over the graph and identifies decisive edges in the graph input without the necessity of heavy domain knowledge. This two-step hybrid policy presents human-friendly interpretations and achieves better performance in terms of generalization and robustness. Extensive experimental studies on four levels of complex text-based games have demonstrated the superiority of the proposed method compared to the state-of-the-art.
翻译:我们提出了一个分两步的混合强化学习(RL)政策,旨在用图表输入生成关于RL问题的可解释和稳健的等级政策。与先前的深强化学习政策不同的是,我们的方法将决策进程分解为两个步骤。第一步是简化分类问题,将图形输入图绘制给一个行动小组,所有行动都具有类似的语义意义。第二步是复杂的规则引导器,对图表进行明确的一站式推理,并在图形输入中找出决定性的边缘,而无需大量域知识。这一两步混合政策提出了对人友好的解释,并在一般化和稳健性方面取得了更好的业绩。对基于文本的复杂游戏的四个层次进行的广泛实验研究表明,拟议方法优于最新技术。