采用进化非线性决定树进行入门政策,以利分立行动系统 (Towards Interpretable-AI Policies Induction using Evolutionary Nonlinear Decision Trees for Discrete Action Systems)

Black-box AI induction methods such as deep reinforcement learning (DRL) are increasingly being used to find optimal policies for a given control task. Although policies represented using a black-box AI are capable of efficiently executing the underlying control task and achieving optimal closed-loop performance, the developed control rules are often complex and neither interpretable nor explainable. In this paper, we use a recently proposed nonlinear decision-tree (NLDT) approach to find a hierarchical set of control rules in an attempt to maximize the open-loop performance for approximating and explaining the pre-trained black-box DRL (oracle) agent using the labelled state-action dataset. Recent advances in nonlinear optimization approaches using evolutionary computation facilitates finding a hierarchical set of nonlinear control rules as a function of state variables using a computationally fast bilevel optimization procedure at each node of the proposed NLDT. Additionally, we propose a re-optimization procedure for enhancing closed-loop performance of an already derived NLDT. We evaluate our proposed methodologies (open and closed-loop NLDTs) on different control problems having multiple discrete actions. In all these problems our proposed approach is able to find relatively simple and interpretable rules involving one to four non-linear terms per rule, while simultaneously achieving on par closed-loop performance when compared to a trained black-box DRL agent. A post-processing approach for simplifying the NLDT is also suggested. The obtained results are inspiring as they suggest the replacement of complicated black-box DRL policies involving thousands of parameters (making them non-interpretable) with relatively simple interpretable policies. Results are encouraging and motivating to pursue further applications of proposed approach in solving more complex control tasks.

翻译：在本文中,我们使用最近提出的非线性决定树(NLDT)方法寻找一套等级化的控制规则,以尽量扩大对某控制任务的最佳政策。虽然使用黑盒AI(DRL)代表的政策能够高效执行基本控制任务和实现最佳闭路运行绩效,但发达的控制规则往往十分复杂,既不能解释,也不能解释。在本文件中,我们使用最近提出的非线性决定树(NLDT)方法来寻找一套等级化的控制规则,以尽量扩大对某项控制任务的最佳操作性能,以适应和解释预先训练的黑盒 DRL(oracle) 参数。虽然使用黑盒AI 代表的政策能够同时高效地执行基本控制任务,但最近采用非线性优化方法的进展有助于找到一套非线性控制规则的等级性,在拟议的全国民主联盟(NLT)的每个节点上,我们提出的一个更快速双级优化程序,用来加强已经取自全国民主联盟(NLT)的闭路替换功能。我们提出的方法(公开和闭路的RT)在多个不连续的操作规则中,一个相对可以解释不固定性规则,而所有这些问题都是用来解释的。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日