Interpretability is a pressing issue for machine learning. Common approaches to interpretable machine learning constrain interactions between features of the input, rendering the effects of those features on a model's output comprehensible but at the expense of model complexity. We approach interpretability from a new angle: constrain the information about the features without restricting the complexity of the model. Borrowing from information theory, we use the Distributed Information Bottleneck to find optimal compressions of each feature that maximally preserve information about the output. The learned information allocation, by feature and by feature value, provides rich opportunities for interpretation, particularly in problems with many features and complex feature interactions. The central object of analysis is not a single trained model, but rather a spectrum of models serving as approximations that leverage variable amounts of information about the inputs. Information is allocated to features by their relevance to the output, thereby solving the problem of feature selection by constructing a learned continuum of feature inclusion-to-exclusion. The optimal compression of each feature -- at every stage of approximation -- allows fine-grained inspection of the distinctions among feature values that are most impactful for prediction. We develop a framework for extracting insight from the spectrum of approximate models and demonstrate its utility on a range of tabular datasets.
翻译:可解释的机器学习的共同方法限制了输入特征之间的相互作用,使这些特征对模型输出的效果可以理解,但以模型的复杂性为代价。我们从一个新的角度看待解释性:限制关于特征的信息而不限制模型的复杂性。我们从信息理论中借款,利用分布式信息瓶颈寻找每个特征的最佳压缩,以最大限度地保存输出信息。根据特征和特征价值,所学的信息分配为解释提供了丰富的机会,特别是在存在许多特征和复杂特征互动的问题时。分析的中心目标不是单一的经过培训的模型,而是一系列模型作为近似,利用关于投入的可变数量的信息。信息根据与产出的相关性被分配到特征,从而通过构建一个学习性的特征包容至排解的连续体来解决特征选择问题。每种特征的最佳压缩 -- -- 在近似的每个阶段 -- -- 都允许对最能影响预测的特征值的区别进行细微的检查。我们开发一个框架,用以利用关于其近似模型的各种数据的洞察力。