This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. Our approach uses the best practices of topological data analysis (TDA) in NLP: we construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers. We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines. We experiment with two datasets, CoLA and RuCoLA in English and Russian, typologically different languages. On top of that, we propose several black-box introspection techniques aimed at detecting changes in the attention mode of the LMs during fine-tuning, defining the LM's prediction confidences, and associating individual heads with fine-grained grammar phenomena. Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs. We release the code and the experimental results for further uptake.
翻译:本文探究了针对可接受性分类进行fine-tune的Transformer语言模型 (LMs) 如何捕捉语言特征。我们采用基于 NLP 的拓扑数据分析 (TDA) 最佳实践方法:从注意力矩阵构建有向的注意力图,从中获得拓扑特征,并将其提供给线性分类器。我们引入了两种新的特征,即和弦性和匹配数,并表明基于TDA的分类器优于fine-tuning的基线方法。我们尝试了两个数据集,英语和俄语的 CoLA 和 RuCoLA,这是两种语系截然不同的语言。此外,我们提出了几种黑盒自省技术,旨在检测fine-tuning过程中LM的注意力模式的变化,定义LM的预测置信度,并将个别attention头部与细粒度语法现象联系起来。我们的结果有助于理解单语LM在接受性分类任务中的行为,提供了注意力头部的功能角色见解,并突出了分析LM的基于TDA的方法的优势。我们将代码和实验结果释放供进一步采纳。