BERT 中的中端乱乱 (Transient Chaos in BERT)

Language is an outcome of our complex and dynamic human-interactions and the technique of natural language processing (NLP) is hence built on human linguistic activities. Bidirectional Encoder Representations from Transformers (BERT) has recently gained its popularity by establishing the state-of-the-art scores in several NLP benchmarks. A Lite BERT (ALBERT) is literally characterized as a lightweight version of BERT, in which the number of BERT parameters is reduced by repeatedly applying the same neural network called Transformer's encoder layer. By pre-training the parameters with a massive amount of natural language data, ALBERT can convert input sentences into versatile high-dimensional vectors potentially capable of solving multiple NLP tasks. In that sense, ALBERT can be regarded as a well-designed high-dimensional dynamical system whose operator is the Transformer's encoder, and essential structures of human language are thus expected to be encapsulated in its dynamics. In this study, we investigated the embedded properties of ALBERT to reveal how NLP tasks are effectively solved by exploiting its dynamics. We thereby aimed to explore the nature of human language from the dynamical expressions of the NLP model. Our short-term analysis clarified that the pre-trained model stably yields trajectories with higher dimensionality, which would enhance the expressive capacity required for NLP tasks. Also, our long-term analysis revealed that ALBERT intrinsically shows transient chaos, a typical nonlinear phenomenon showing chaotic dynamics only in its transient, and the pre-trained ALBERT model tends to produce the chaotic trajectory for a significantly longer time period compared to a randomly-initialized one. Our results imply that local chaoticity would contribute to improving NLP performance, uncovering a novel aspect in the role of chaotic dynamics in human language behaviors.

翻译：语言是我们复杂和动态的人际互动的结果,因此自然流学处理技术(NLP)是建立在人类语言活动的基础之上的。来自变异器(BERT)的双向读数显示器最近通过在几个 NLP 基准中建立最先进的分数而获得了受欢迎。利特BERT(ALBERT)的字面特征是BERT的轻量版,在这个版本中,BERT参数的数量通过反复应用称为变异器的中流线变码层来减少。通过以大量自然语言数据对参数进行预先训练,ALBERT可以将输入的句转换成具有解决多种NLP任务潜力的全天性高维矢量。从这个意义上讲,ALBERT可以被视为一个设计良好的高维动态系统,其操作器是变异器的编码,因此人类语言的基本结构只能被压缩到其动态中。我们研究ALBERT的内嵌性能,通过利用它的直径直径数据数据数据数据数据数据数据来有效地解决NLP任务是如何通过利用它的动态来有效解析的。我们变异性分析的。因此,我们用直径变变变变变变的流分析显示的ALLLLLLLA 也旨在探索性分析过程的。我们的一个性能性能分析过程的。我们用来探索性能将人类的。