Though linguistic knowledge emerges during large-scale language model pretraining, recent work attempt to explicitly incorporate human-defined linguistic priors into task-specific fine-tuning. Infusing language models with syntactic or semantic knowledge from parsers has shown improvements on many language understanding tasks. To further investigate the effectiveness of structural linguistic priors, we conduct empirical study of replacing parsed graphs or trees with trivial ones (rarely carrying linguistic knowledge e.g., balanced tree) for tasks in the GLUE benchmark. Encoding with trivial graphs achieves competitive or even better performance in fully-supervised and few-shot settings. It reveals that the gains might not be significantly attributed to explicit linguistic priors but rather to more feature interactions brought by fusion layers. Hence we call for attention to using trivial graphs as necessary baselines to design advanced knowledge fusion methods in the future.
翻译:尽管语言知识出现在大型语言模式培训前,但最近的工作试图将人类定义的语言前科明确纳入具体任务的微调中。使用具有来自剖析者的合成或语义知识的语言模型表明许多语言理解任务的改进。为了进一步调查结构语言前科的有效性,我们进行了经验研究,用微小的图或树替换GLUE基准中的任务(极少含有语言知识,例如平衡的树)。与微小的图解编码在完全监督的和少发的设置中取得有竞争力甚至更好的性能。它表明,这些收益可能并非主要归因于明确的语言前科,而是归因于融合层带来的更多特征互动。因此,我们呼吁注意将微小图作为必要的基线,在今后设计先进的知识融合方法。