Idiomatic expressions are an integral part of natural language and constantly being added to a language. Owing to their non-compositionality and their ability to take on a figurative or literal meaning depending on the sentential context, they have been a classical challenge for NLP systems. To address this challenge, we study the task of detecting whether a sentence has an idiomatic expression and localizing it. Prior art for this task had studied specific classes of idiomatic expressions offering limited views of their generalizability to new idioms. We propose a multi-stage neural architecture with the attention flow mechanism for identifying these expressions. The network effectively fuses contextual and lexical information at different levels using word and sub-word representations. Empirical evaluations on three of the largest benchmark datasets with idiomatic expressions of varied syntactic patterns and degrees of non-compositionality show that our proposed model achieves new state-of-the-art results. A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines on the largest dataset.
翻译:单词表达方式是自然语言的一个组成部分,并且经常被添加到一种语言中。由于它们不具有分层性,而且能够根据感官环境而具有比喻或字面意义,因此它们是NLP系统的一个典型挑战。为了应对这一挑战,我们研究了检测一个句子是否具有独词表达方式并将其本地化的任务。先前,这项任务的艺术研究的是特定类别的单词表达方式,这些表达方式的通用性有限。我们建议了一个多阶段的神经结构,其中含有识别这些表达方式的注意流机制。网络有效地结合了不同级别的背景和词汇信息,使用字词和子字词表达方式。对三个最大基准数据集进行的经验性评价显示,我们提议的模型取得了新的艺术状态结果。模型的一个突出特征是,在培训期间能够识别看不见的单词,在最大数据集的竞争性基线上,从1.4%到30.8%不等。