Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.
翻译:由于在一系列广泛任务上取得了令人鼓舞的成果,国家语言方案领域正在经历加速的竞赛,以开发更大的语言模型。这种争夺较大模型的竞赛也突出表明,需要继续寻求实用的蒸馏方法,以便以计算效率的方式利用这些大模型获得的知识。考虑到这一目标,我们在最近的工作基础上提出一个特别适合蒸馏的无幻觉序列标记框架。我们展示了跨多个序列标签数据集的新的最新性能的经验结果,并验证了这一框架对于在微小的学习情景中提取大型模型的有用性。