The probing methodology allows one to obtain a partial representation of linguistic phenomena stored in the inner layers of the neural network, using external classifiers and statistical analysis. Pre-trained transformer-based language models are widely used both for natural language understanding (NLU) and natural language generation (NLG) tasks making them most commonly used for downstream applications. However, little analysis was carried out, whether the models were pre-trained enough or contained knowledge correlated with linguistic theory. We are presenting the chronological probing study of transformer English models such as MultiBERT and T5. We sequentially compare the information about the language learned by the models in the process of training on corpora. The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language, including morphology, syntax, and even discourse, while they also can inconsistently fail on tasks that are perceived as easy. We also introduce the open-source framework for chronological probing research, compatible with other transformer-based models. https://github.com/EkaterinaVoloshina/chronological_probing
翻译:调查方法使人们能够利用外部分类和统计分析,部分代表神经网络内层储存的语言现象,在自然语言理解(NLU)和自然语言生成(NLG)任务中广泛使用预先培训的变压器基语言模型,使这些模型最常用于下游应用,然而,很少进行分析,这些模型是否经过预先培训,或是否包含与语言理论相关的知识。我们正在介绍对变压器英语模型,如多巴热和T5的按时间顺序进行的研究。我们依次比较了这些模型在公司培训过程中所学语言的信息。结果显示:1)语言信息是在培训的早期阶段获得的。2两种语文模型都显示有能力捕捉不同语言层次的不同特征,包括形态学、合成法、甚至话语法,同时这些模型在被认为容易的任务上也可能前后不一致。我们还介绍了与其他变压器模型兼容的按时间顺序研究的开放源框架。https://github.com/EkaterinaVoloshina/chroning_prologia_proing。