The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.
翻译:最先进的文本分类器的庞大和复杂的决定机制使得人类难以理解其预测,从而导致用户可能缺乏信任。这些问题导致采用SHAP和集成梯度等方法来解释分类决定,将重要分数分配给输入符号。然而,以往的工作,使用不同的随机化测试,表明这些方法产生的解释可能并不可靠。例如,对测试集作出相同预测的模型仍然可能导致不同的特征重要分级。为了解决基于象征性的解释缺乏稳健性的问题,我们探索在诸如判决等更高语义等级的解释。我们使用计算指标和人文主题研究来比较基于判决的解释质量与基于象征性符号的解释的质量。我们的实验表明,较高级别的特性属性提供若干好处:(1)根据随机化测试的测量,这些特性比较强;(2)如果使用类似SHAP的近似方法,它们可能导致较低的差异性;(3)在语言一致性位于更高谷类的状态下,我们更了解这些人类。我们使用计算指标和人主题研究来比较基于判决的判分数质量和基于象征性判量的判分质量质量的质量。我们对这些结论进行最方便的判分数分析,这些结论显示,我们以最易判分度为最精确的模型。