使用甲骨文的学习解释模型 (Learning Interpretable Models Using an Oracle)

We look at a specific aspect of model interpretability: models often need to be constrained in size for them to be considered interpretable, e.g., a decision tree of depth 5 is easier to interpret than one of depth 50. But smaller models also tend to have high bias. This suggests a trade-off between interpretability and accuracy. We propose a model agnostic technique to minimize this trade-off. Our strategy is to first learn an oracle, a highly accurate probabilistic model on the training data. The uncertainty in the oracle's predictions are used to learn a sampling distribution for the training data. The interpretable model is then trained on a data sample obtained using this distribution, leading often to significantly greater accuracy. We formulate the sampling strategy as an optimization problem. Our solution1 possesses the following key favorable properties: (1) it uses a fixed number of seven optimization variables, irrespective of the dimensionality of the data (2) it is model agnostic - in that both the interpretable model and the oracle may belong to arbitrary model families (3) it has a flexible notion of model size, and can accommodate vector sizes (4) it is a framework, enabling it to benefit from progress in the area of optimization. We also present the following interesting observations: (a) In general, the optimal training distribution at small model sizes is different from the test distribution; (b) This effect exists even when the interpretable model and the oracle are from highly disparate model families: we show this on a text classification task, by using a Gated Recurrent Unit network as an oracle to improve the sequence classification accuracy of a Decision Tree that uses character n-grams; (c) Our technique may be used to identify an optimal training sample of a given sample size, for a model.

翻译：我们审视了模型可解释性的一个具体方面:模型通常需要限制其大小,才能被视为可解释性,例如,模型的大小往往需要限制,才能被视为可解释性,例如,一个深度5的决策树比深度50比深度50更容易解释,但较小的模型也往往具有高度偏差。这表明在解释性和准确性之间有一个权衡取舍。我们建议了一种模型不可知性技术,以尽量减少这种权衡。我们的战略是首先学习一个甲骨文,一种高度精确的训练数据模型。甲骨文的预测中的不确定性被用来学习培训数据的抽样分布。然后,一个可解释性5的样本模型在利用这种分布方法获得的数据样本方面受到训练,往往导致更大的精确性。我们把抽样战略作为优化问题。我们的解决方案1具有以下关键的有利特性:(1)它使用固定数量的七种优化变量,而不论数据的维度如何(2)它是一种模型的不精确性模型和甲骨文的分类,它可能属于任意的模型型组(它有一个灵活的模型尺寸概念),而且可以容纳模型的矢量大小(4)它,甚至可以容纳使用这种矢量的样本的样本的样本的样本,它作为框架。使得它能够从目前使用一种最精确的模型的模型的缩缩化的缩化的缩化的缩。