Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models.
翻译:寻找有助于预测神经模型的重要特征是可解释的AI中一个积极研究领域。神经模型不透明,发现这些特征有助于更好地了解它们的预测。相反,在这项工作中,我们呈现了对转移因素特征的反向观点:通过影响模型对预测的信心而对预测产生怀疑的特征;理解分散因素提供了对神经模型预测中特征相关性的补充性观点。在本文件中,我们应用了一种基于减少的技术来寻找转移因素,并提供了我们对其影响和类型的初步结果。我们在不同任务、模型和代码数据集的实验显示,去除象征物可能对模型对其预测和象征物类别的信心产生重大影响。我们的研究旨在通过强调那些对模型信心有重大影响的象征物来提高模型的透明度。</s>