Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to compose meaningful representations from a much smaller subword vocabulary.
翻译:最近的研究调查了储存在大型预先培训语言模型(PLMs)中的事实知识。 与结构知识库(KB)的查询不同,“巴黎是[MASK]的首都”等隐含的句子被用作探测器。 分析任务的良好表现被解释为PLMs成为潜在的事实知识库。 在十种语言多样化的实验中,我们研究了静态嵌入中所含的知识。 我们发现,在将输出空间限制在候选数据集中时,使用静态嵌入器进行简单近邻匹配比PLMs要好得多。 E. 例如,静态嵌入比BERT高出1.6%,而只是将0.3%的能量用于培训。 其良好的比较性表现的一个重要因素是, 静态嵌入在大型词汇中是标准学习的。 相反, BERT利用它更复杂但昂贵的能力从更小的子词词汇中进行有意义的表达。