Linguistic Acceptability is the task of determining whether a sentence is grammatical or ungrammatical. It has applications in several use cases like Question-Answering, Natural Language Generation, Neural Machine Translation, where grammatical correctness is crucial. In this paper we aim to understand the decision-making process of BERT (Devlin et al., 2019) in distinguishing between Linguistically Acceptable sentences (LA) and Linguistically Unacceptable sentences (LUA). We leverage Layer Integrated Gradients Attribution Scores (LIG) to explain the Linguistic Acceptability criteria that are learnt by BERT on the Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2018) benchmark dataset. Our experiments on 5 categories of sentences lead to the following interesting findings: 1) LIG for LA are significantly smaller in comparison to LUA, 2) There are specific subtrees of the Constituency Parse Tree (CPT) for LA and LUA which contribute larger LIG, 3) Across the different categories of sentences we observed around 88% to 100% of the Correctly classified sentences had positive LIG, indicating a strong positive relationship to the prediction confidence of the model, and 4) Around 43% of the Misclassified sentences had negative LIG, which we believe can become correctly classified sentences if the LIG are parameterized in the loss function of the model.
翻译:语言可接受性是确定一项判决是语法还是非语法的任务。 它适用于数个使用案例, 如问题解答、自然语言生成、神经机器翻译, 其中语法正确性至关重要。 在本文中,我们旨在理解BERT(Devlin等人, 2019年)的决策过程,以区分语言可接受性判决(LA)和语言不可接受判决(LUA) 。 我们利用层综合渐进分数(LIG)来解释BERT在语言可接受性(COLA)(Warstadt等人,2018年)基准数据集中学习的语言可接受性标准。我们在5类判决的实验得出以下令人感兴趣的结论:(1) LIGL比语言可接受性判决(LUA,2019年)要小得多。 LUA和LUA语言不可接受性判决(CPT)的具体次树模式有助于更大的LIG, 3) 在我们观察到的各类判决类别中大约88%至100%的可接受性标准(CLIG)基准数据集中,如果我们准确分类的LIG判决的正确相信LIG的正确判断性判断性判断性判决,那么LIG判决的精确性判断性判断性判断性判断性判决,那么, LIG的LIG判决的精确性判断性判断性判断性判断性判断性判断性判断性判断性判断性判断性判断性判断性判断性判决是正确性判决。