Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. As units of knowledge in a specific field of expertise, extracted terms are not only beneficial for several terminographical tasks, but also support and improve several complex downstream tasks, e.g., information retrieval, machine translation, topic detection, and sentiment analysis. ATE systems, along with annotated datasets, have been studied and developed widely for decades, but recently we observed a surge in novel neural systems for the task at hand. Despite a large amount of new research on ATE, systematic survey studies covering novel neural approaches are lacking. We present a comprehensive survey of deep learning-based approaches to ATE, with a focus on Transformer-based neural models. The study also offers a comparison between these systems and previous ATE approaches, which were based on feature engineering and non-neural supervised learning algorithms.
翻译:自动提取(ATE)是一项自然语言处理(NLP)任务,通过提供一份候选术语清单,方便了人工确定特定领域公司术语的努力,作为特定专门知识领域的知识单位,提取的术语不仅有益于若干术语任务,而且支持和改进若干复杂的下游任务,如信息检索、机器翻译、专题探测和情绪分析等。ATE系统连同附加说明的数据集,已经进行了几十年的广泛研究和发展,但最近我们观察到手头任务的新神经系统激增。尽管在ATE上进行了大量新的研究,但缺乏涵盖新神经方法的系统调查研究。我们介绍了对基于深层次学习的ATE方法的全面调查,重点是基于变形器的神经模型。该研究还对这些系统和先前的ATE方法进行了比较,这些方法以特征工程和非神经监督的学习算法为基础。