We present the \emph{minimal feature removal problem} for neural networks, a combinatorial problem which has interesting potential applications for improving interpretability and robustness of neural network predictions. For a given input to a trained neural network, our aim is to compute a smallest set of input features so that the model prediction changes when these features are disregarded by setting them to a given uninformative baseline value. We show that computing such minimal subsets of features is computationally intractable for fully-connected neural networks with ReLU nonlinearities. We show, however, that the problem becomes solvable in polynomial time by a greedy algorithm for monotonic networks. We then show that our tractability result extends seamlessly to more advanced neural network architectures such as convolutional and graph neural networks under suitable monotonicity assumptions.
翻译:我们为神经网络展示了 \ emph{ 最小特征清除问题}, 这是一个组合问题, 它在改善神经网络预测的可解释性和可靠性方面有着令人感兴趣的潜在应用。 对于对受过训练的神经网络的一种特定输入, 我们的目标是计算最小的输入特性, 以便当这些特性被忽略时, 将这些特性设定为给定的无信息基准值, 从而忽略模型预测变化。 我们显示, 计算这些最小的特性子集对于与 ReLU 的非线性完全相连的神经网络来说, 具有计算上的难度。 然而, 我们显示, 这个问题在单体网络的贪婪算法下, 在多球时变得可溶解。 我们然后显示, 我们的可感应结果无缝地延伸到更先进的神经网络结构, 如在适当的单一假设下, 共生和形神经网络 。