MOET: 专家树木混合及其适用于可核查强化学习 (MoET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning)

Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adaption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to analyze the inner workings of the model. We present MoET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. While decision boundaries of decision trees (used in an existing verifiable approach), are axis-perpendicular hyperplanes, MoET supports hyperplanes of arbitrary orientation as the boundaries. To support non-differentiable decision trees as experts we formulate a novel training procedure. In addition, we introduce a hard thresholding version, MoET_h, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, MoET_h allows each prediction to be easily decomposed into a set of logical rules. Such rules can be translated into a manageable SMT formula providing rich means for verification. While MoET is a general use model, we illustrate its power in the reinforcement learning setting. By training MoET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models.

翻译：深层次学习的快速进步导致了许多最近的突破。虽然深层次学习模式取得了优异的性能,在统计上往往比人类好,但是,由于无法提供安全保障或分析模型的内部运行情况,因此难以适应安全临界环境,如医疗或自行驾驶汽车等。我们介绍了基于专家混合体的新颖模式,即决策树专家和普遍的线性模型定位功能。决定树的决策界限(在现有的可核查办法中使用)是轴垂直的超高平板,但是,教育和教育部支持任意取向的超高平板作为边界。为了支持无差别的决策树,作为专家,我们制定了新的培训程序。此外,我们引入了一个硬门槛版本,即MOET_h,其中的预测完全由一位通过定位功能挑选的专家来作出。由于这种属性,MOET_h允许每一项预测容易地分解成一套逻辑规则。这些规则可以转化为可操作的SMT公式,提供丰富的核查手段。虽然MoET是一种通用模型,但作为我们用来使用一种不区分性的决定树的模型,我们用一种深层次的模型来说明它的力量。我们用一种在强化的模型上学习以稳定性研究方式的模型,我们以试验为基础的模型,我们用一种强化的模型来学习。