Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for continuous control problems. While the performance of these approaches warrants real-world adoption in domains, such as in autonomous driving and robotics, these policies lack interpretability, limiting deployability in safety-critical and legally-regulated domains. Such domains require interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33$\%$ in autonomous driving scenarios while achieving a $300$x-$600$x reduction in the number of policy parameters against deep learning baselines.
翻译:强化学习(RL)的渐进式方法在学习持续控制问题的政策方面取得了巨大成功。虽然这些方法的绩效需要在诸如自主驾驶和机器人等领域实际采用,但这些政策缺乏可解释性,限制了安全关键和受法律管制领域的可部署性。这些领域需要可解释和可核查的控制政策,以保持高性能。我们提议了可解释和可核查的连续控制树(ICCTs),这是一种以树为基础的模式,可以通过现代、梯度和RL方式优化,以产生高绩效和可解释的政策。我们方法的关键是允许在稀有的决策型代表处直接优化程序。我们对照六个领域的基线验证了国际电算技术中心。我们证明国际电算技术中心能够学习可解释的政策表述,即在自主驾驶情景中,平等或优于基线,最高为33美元,同时根据深层次学习基线,使政策参数减少300美元-600美元。