Black-box probing models can reliably extract linguistic features like tense, number, and syntactic role from pretrained word representations. However, the manner in which these features are encoded in representations remains poorly understood. We present a systematic study of the linear geometry of contextualized word representations in ELMO and BERT. We show that a variety of linguistic features (including structured dependency relationships) are encoded in low-dimensional subspaces. We then refine this geometric picture, showing that there are hierarchical relations between the subspaces encoding general linguistic categories and more specific ones, and that low-dimensional feature encodings are distributed rather than aligned to individual neurons. Finally, we demonstrate that these linear subspaces are causally related to model behavior, and can be used to perform fine-grained manipulation of BERT's output distribution.
翻译:黑盒检验模型可以可靠地从经过训练的字形演示中提取语言特征,如时态、数字和综合作用。 但是,这些特征的编码方式仍然不易理解。 我们对ELMO和BERT中背景化字形表达的线性几何学进行系统研究。 我们显示,在低维次空间中,有多种语言特征(包括结构上的依赖关系)编码。 然后,我们细化这一几何图画,显示子空间编码一般语言类别和更具体的类别之间有等级关系,低维特征编码是分布的,而不是与单个神经元一致的。 最后,我们证明这些线性子空间与模式行为有因果关系,可以用来对BERT的输出分布进行精细细的操纵。