Black-box AI induction methods such as deep reinforcement learning (DRL) are increasingly being used to find optimal policies for a given control task. Although policies represented using a black-box AI are capable of efficiently executing the underlying control task and achieving optimal closed-loop performance, the developed control rules are often complex and neither interpretable nor explainable. In this paper, we use a recently proposed nonlinear decision-tree (NLDT) approach to find a hierarchical set of control rules in an attempt to maximize the open-loop performance for approximating and explaining the pre-trained black-box DRL (oracle) agent using the labelled state-action dataset. Recent advances in nonlinear optimization approaches using evolutionary computation facilitates finding a hierarchical set of nonlinear control rules as a function of state variables using a computationally fast bilevel optimization procedure at each node of the proposed NLDT. Additionally, we propose a re-optimization procedure for enhancing closed-loop performance of an already derived NLDT. We evaluate our proposed methodologies (open and closed-loop NLDTs) on different control problems having multiple discrete actions. In all these problems our proposed approach is able to find relatively simple and interpretable rules involving one to four non-linear terms per rule, while simultaneously achieving on par closed-loop performance when compared to a trained black-box DRL agent. A post-processing approach for simplifying the NLDT is also suggested. The obtained results are inspiring as they suggest the replacement of complicated black-box DRL policies involving thousands of parameters (making them non-interpretable) with relatively simple interpretable policies. Results are encouraging and motivating to pursue further applications of proposed approach in solving more complex control tasks.
翻译:在本文中,我们使用最近提出的非线性决定树(NLDT)方法寻找一套等级化的控制规则,以尽量扩大对某控制任务的最佳政策。虽然使用黑盒AI(DRL)代表的政策能够高效执行基本控制任务和实现最佳闭路运行绩效,但发达的控制规则往往十分复杂,既不能解释,也不能解释。在本文件中,我们使用最近提出的非线性决定树(NLDT)方法来寻找一套等级化的控制规则,以尽量扩大对某项控制任务的最佳操作性能,以适应和解释预先训练的黑盒 DRL(oracle) 参数。虽然使用黑盒AI 代表的政策能够同时高效地执行基本控制任务,但最近采用非线性优化方法的进展有助于找到一套非线性控制规则的等级性,在拟议的全国民主联盟(NLT)的每个节点上,我们提出的一个更快速双级优化程序,用来加强已经取自全国民主联盟(NLT)的闭路替换功能。我们提出的方法(公开和闭路的RT)在多个不连续的操作规则中,一个相对可以解释不固定性规则,而所有这些问题都是用来解释的。