Deep Reinforcement Learning (DRL) has recently achieved significant advances in various domains. However, explaining the policy of RL agents still remains an open problem due to several factors, one being the complexity of explaining neural networks decisions. Recently, a group of works have used decision-tree-based models to learn explainable policies. Soft decision trees (SDTs) and discretized differentiable decision trees (DDTs) have been demonstrated to achieve both good performance and share the benefit of having explainable policies. In this work, we further improve the results for tree-based explainable RL in both performance and explainability. Our proposal, Cascading Decision Trees (CDTs) apply representation learning on the decision path to allow richer expressivity. Empirical results show that in both situations, where CDTs are used as policy function approximators or as imitation learners to explain black-box policies, CDTs can achieve better performances with more succinct and explainable models than SDTs. As a second contribution our study reveals limitations of explaining black-box policies via imitation learning with tree-based explainable models, due to its inherent instability.
翻译:深入强化学习(DRL)最近在不同领域取得了显著进步。然而,解释RL代理商的政策仍是一个尚未解决的问题,因为有几个因素,其中一个因素是解释神经网络决定的复杂性。最近,一组作品使用了基于决策的基于树木的模式来学习可以解释的政策。软决策树(SDTs)和分散的可解释决策树(DDTs)已证明既能取得良好的业绩,也能分享制定可解释的政策的好处。在这项工作中,我们进一步改进了基于树的可解释RL在性能和可解释性两方面的结果。我们的建议,CCDTs(CDTs)在决定路径上应用代表学习来进行更富清晰的表达性。经验性结果表明,在这两种情况下,CDTs被用作政策功能的辅助者或模仿者来解释黑箱政策,CDTs能够以比SDTs更简洁和可解释的模式取得更好的业绩。作为第二项贡献,我们的研究揭示了通过模仿可解释的树本模型来解释黑箱政策的局限性,因为其固有的不稳定性。