Pre-trained transformer language models such as BERT are ubiquitous in NLP research, leading to work on understanding how and why these models work. Attention mechanisms have been proposed as a means of interpretability with varying conclusions. We propose applying BERT-based models to a sequence classification task and using the data set's labeling schema to measure each model's interpretability. We find that classification performance scores do not always correlate with interpretability. Despite this, BERT's attention weights are interpretable for over 70% of examples.
翻译:培训前变压器语言模型,如BERT等,在NLP研究中无处不在,导致人们了解这些模型如何和为什么起作用。提出了注意机制,作为可解释的手段,并得出不同的结论。我们提议将基于BERT的模型应用于序列分类任务,并使用数据集的标签模式来衡量每个模型的可解释性。我们发现,分类性能分数并不总是与可解释性相关。尽管如此,BERT的注意权重对于70%以上的例子是可以解释的。