Acronym extraction aims to find acronyms (i.e., short-forms) and their meanings (i.e., long-forms) from the documents, which is important for scientific document understanding (SDU@AAAI-22) tasks. Previous works are devoted to modeling this task as a paragraph-level sequence labeling problem. However, it lacks the effective use of the external knowledge, especially when the datasets are in a low-resource setting. Recently, the prompt-based method with the vast pre-trained language model can significantly enhance the performance of the low-resourced downstream tasks. In this paper, we propose a Prompt-based Sequence Generation (PSG) method for the acronym extraction task. Specifically, we design a template for prompting the extracted acronym texts with auto-regression. A position extraction algorithm is designed for extracting the position of the generated answers. The results on the acronym extraction of Vietnamese and Persian in a low-resource setting show that the proposed method outperforms all other competitive state-of-the-art (SOTA) methods.
翻译:缩略语旨在从文件中找到缩略语(即短写)及其含义(即长写法),这对于科学文件理解(SDU@AAAI-22)任务十分重要。以前的工作致力于将这项工作作为段落级序列标签问题进行模型化,但缺乏对外部知识的有效利用,特别是当数据集处于低资源环境时。最近,使用大量预先培训语言模型的快速方法可以大大加强低资源下游任务的业绩。本文提出了快速序列生成(PSG)方法,用于缩略略语提取任务。具体地说,我们设计了一个模板,用自动反推法来提示所提取的缩略语。定位提取算法旨在提取所产生答案的位置。在低资源环境下提取越南和波斯语缩略语的结果显示,拟议的方法超越了所有其他竞争性的状态-艺术方法。