A large number of studies that analyze deep neural network models and their ability to encode various linguistic and non-linguistic concepts provide an interpretation of the inner mechanics of these models. The scope of the analyses is limited to pre-defined concepts that reinforce the traditional linguistic knowledge and do not reflect on how novel concepts are learned by the model. We address this limitation by discovering and analyzing latent concepts learned in neural network models in an unsupervised fashion and provide interpretations from the model's perspective. In this work, we study: i) what latent concepts exist in the pre-trained BERT model, ii) how the discovered latent concepts align or diverge from classical linguistic hierarchy and iii) how the latent concepts evolve across layers. Our findings show: i) a model learns novel concepts (e.g. animal categories and demographic groups), which do not strictly adhere to any pre-defined categorization (e.g. POS, semantic tags), ii) several latent concepts are based on multiple properties which may include semantics, syntax, and morphology, iii) the lower layers in the model dominate in learning shallow lexical concepts while the higher layers learn semantic relations and iv) the discovered latent concepts highlight potential biases learned in the model. We also release a novel BERT ConceptNet dataset (BCN) consisting of 174 concept labels and 1M annotated instances.
翻译:分析深层神经网络模型及其将各种语言和非语言概念编码的能力的大量研究,为这些模型的内部机理提供了解释。这些分析的范围限于增强传统语言知识的预设概念,而没有反映模型是如何学习新概念的。我们通过以不受监督的方式发现和分析神经网络模型中发现的潜伏概念,从模型的角度提供解释来解决这一局限性。在这项工作中,我们研究:(一) 预先培训的BERT模型中存在哪些潜在概念,(二) 发现的潜在概念与古典语言等级和(三) 潜在概念如何相容或不同,以及(三) 潜在概念如何跨层次演变。我们的调查结果显示: (一) 模型学习新概念(例如动物类别和人口群体),这些概念并不严格遵循任何预先界定的分类(例如POS、语义标记),(二) 几个潜在概念基于多种特性,其中可能包括语义学、合成学和形态学说,(三) 模型中较低的层次,在学习深层理论概念中,(也学习高层次) 数据库中,在深度数据库中学习了浅层理论概念。