We build on abduction-based explanations for ma-chine learning and develop a method for computing local explanations for neural network models in natural language processing (NLP). Our explanations comprise a subset of the words of the in-put text that satisfies two key features: optimality w.r.t. a user-defined cost function, such as the length of explanation, and robustness, in that they ensure prediction invariance for any bounded perturbation in the embedding space of the left out words. We present two solution algorithms, respectively based on implicit hitting sets and maximum universal subsets, introducing a number of algorithmic improvements to speed up convergence of hard instances. We show how our method can be con-figured with different perturbation sets in the em-bedded space and used to detect bias in predictions by enforcing include/exclude constraints on biased terms, as well as to enhance existing heuristic-based NLP explanation frameworks such as Anchors. We evaluate our framework on three widely used sentiment analysis tasks and texts of up to100words from SST, Twitter and IMDB datasets,demonstrating the effectiveness of the derived explanations.
翻译:我们以绑架为基础的解释为基础,对在自然语言处理过程中的神经网络模型(NLP)进行学习,并开发一种计算当地对神经网络模型的解释的方法。我们的解释由符合以下两个关键特征的份数组成:最佳性w.r.t.的用户定义成本功能,例如解释长度和稳健性,以确保预测在将左外语嵌入空格中存在任何受约束的扰动。我们提出了两种解决方案算法,分别以隐含的敲击机和最大通用子集为基础,采用若干算法改进来加快硬性实例的趋同速度。我们展示了我们的方法如何与被淹没的空域中不同的扰动装置相搭配,并通过执行包含/排除对有偏向的术语的限制来发现预测中的偏差,以及加强现有的以超光学为基础的NLP解释框架,如Anchors。我们评估了我们关于三个广泛使用的情绪分析任务的框架,以及SST、Twitter和IMDB数据解算出的100字的文本。我们如何与SST、Twitter和IMDB数据解算算出的效果?