The importance of the neighborhood for training a local surrogate model to approximate the local decision boundary of a black box classifier has been already highlighted in the literature. Several attempts have been made to construct a better neighborhood for high dimensional data, like texts, by using generative autoencoders. However, existing approaches mainly generate neighbors by selecting purely at random from the latent space and struggle under the curse of dimensionality to learn a good local decision boundary. To overcome this problem, we propose a progressive approximation of the neighborhood using counterfactual instances as initial landmarks and a careful 2-stage sampling approach to refine counterfactuals and generate factuals in the neighborhood of the input instance to be explained. Our work focuses on textual data and our explanations consist of both word-level explanations from the original instance (intrinsic) and the neighborhood (extrinsic) and factual- and counterfactual-instances discovered during the neighborhood generation process that further reveal the effect of altering certain parts in the input text. Our experiments on real-world datasets demonstrate that our method outperforms the competitors in terms of usefulness and stability (for the qualitative part) and completeness, compactness and correctness (for the quantitative part).
翻译:文献中已经强调了邻居对培训当地代用模型以接近黑盒分类器当地决定界限的重要性,文献中已经强调了社区对培训当地代用模型的重要性。一些尝试已经试图通过使用基因自动演算器,为高维数据(如文本)建造更好的社区,例如文本。然而,现有办法主要通过纯粹随机地从潜在空间中挑选邻居,并在维度诅咒下进行斗争,以了解良好的当地决定界限。为了克服这一问题,我们提议利用反事实实例作为初始地标和谨慎的二阶段抽样方法逐步接近社区,以完善反事实,并在输入实例附近产生事实。我们的工作侧重于文本数据,我们的解释包括原始实例(异性)和邻里(极端)的字级解释,以及在邻里生成过程中发现的事实和反实际情况,从而进一步揭示了改变输入文本中某些部分的效果。我们在现实世界数据集上的实验表明,我们的方法在实用性和稳定性方面(质量部分、完整性部分)和完整性方面(质量部分、完整性部分)超越了竞争者。