Multiword expressions (MWEs) present groups of words in which the meaning of the whole is not derived from the meaning of its parts. The task of processing MWEs is crucial in many natural language processing (NLP) applications, including machine translation and terminology extraction. Therefore, detecting MWEs is a popular research theme. In this paper, we explore state-of-the-art neural transformers in the task of detecting MWEs.We empirically evaluate several transformer models in the dataset for SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM). We show that transformer models outperform the previous neural models based on long short-term memory (LSTM). The code and pre-trained model will be made freely available to the community.
翻译:多字表达式(MWEs) 展示了整体含义并非从其部分含义出发的一组词。处理MWEs的任务在许多自然语言处理(NLP)应用中至关重要,包括机器翻译和术语提取。因此,发现MWEs是一个受欢迎的研究主题。在本文中,我们在探测MWE的任务中探索最先进的神经变异器。我们在SemEval 2016任务10:检测最小语义单位及其含义(DIMSUM)的数据集中对几个变压器模型进行了实证评估。我们显示变压器模型超越了基于长期短期内存(LSTM)的先前神经模型。代码和预培训模型将免费提供给社区使用。