Post-processing of static embedding has beenshown to improve their performance on both lexical and sequence-level tasks. However, post-processing for contextualized embeddings is an under-studied problem. In this work, we question the usefulness of post-processing for contextualized embeddings obtained from different layers of pre-trained language models. More specifically, we standardize individual neuron activations using z-score, min-max normalization, and by removing top principle components using the all-but-the-top method. Additionally, we apply unit length normalization to word representations. On a diverse set of pre-trained models, we show that post-processing unwraps vital information present in the representations for both lexical tasks (such as word similarity and analogy)and sequence classification tasks. Our findings raise interesting points in relation to theresearch studies that use contextualized representations, and suggest z-score normalization as an essential step to consider when using them in an application.
翻译:静态嵌入器的后处理一直是为了提高它们在词汇和顺序层面任务上的业绩。 但是,对背景化嵌入器的后处理是一个研究不足的问题。 在这项工作中,我们质疑对从不同层次的预培训语言模型中获得的背景化嵌入器的后处理是否有用。 更具体地说,我们利用“z-score”和“min-max”规范化个体神经活化器,并通过使用“全上”和“全上”方法去除顶端主控组件来实现标准化。 此外,我们将单位长度正常化应用于文字表达。 在一套不同的预先培训的模型中,我们展示了后处理后解包装在词汇性任务(例如“类似”和“类比”)和顺序分类任务表中都含有的重要信息。我们的调查结果提出了与使用背景化表达法的研究有关的有趣要点,并建议将“z-co”的正常化作为在应用时考虑的关键步骤。