Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.
翻译:变异器(BERT)的双向编码器表示方式在各种自然语言处理任务中达到了最先进的结果。然而,对内部功能的理解仍然不够和不令人满意。为了更好地了解变异器和其他变异器模型,我们提出了对变异器隐藏状态的分层分析。与以往主要侧重于通过注意重量来解释变异器模型的研究不同,我们认为隐藏状态包含同样有价值的信息。具体地说,我们的分析侧重于对问答任务进行微调的模型,作为复杂下游任务的一个实例。我们检查QA模型如何转换代号矢量器以找到正确的答案。为此,我们采用一套一般性和针对变异器的预测任务来揭示每个代表层储存的信息。我们对隐蔽状态的定性分析为变异器的推理过程提供了更多的洞察力。我们的结果显示,BERT内部的变异样会经过与传统的输油管任务有关的阶段。因此,系统可以隐含将特定任务信息纳入其代号矢量矢量矢量矢量矢量矢量的转换方式,以便找到正确的答案。此外,我们的分析还表明,对矢量的精确的变形能力是微的。