The advent of contextual word embeddings -- representations of words which incorporate semantic and syntactic information from their context -- has led to tremendous improvements on a wide variety of NLP tasks. However, recent contextual models have prohibitively high computational cost in many use-cases and are often hard to interpret. In this work, we demonstrate that our proposed distillation method, which is a simple extension of CBOW-based training, allows to significantly improve computational efficiency of NLP applications, while outperforming the quality of existing static embeddings trained from scratch as well as those distilled from previously proposed methods. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings via standard lexical evaluation tasks.
翻译:上下文词嵌入的出现 -- -- 表达从上下文中包含语义和合成信息的文字 -- -- 导致大量改进了各种非常规语言任务,然而,最近的背景模型在许多使用案例中的计算成本高得令人望而却步,而且往往难以解释。在这项工作中,我们证明我们提议的蒸馏方法,即基于CBOW的培训的简单延伸,能够大大提高非常规语言应用程序的计算效率,同时优于从零开始培训的现有静态嵌入的质量以及从先前建议的方法中提炼出来的质量。作为一种副作用,我们的方法还允许通过标准词汇评估任务对背景和静态嵌入进行公平的比较。