Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
翻译:在上下文中确认和分类地名录对于语言学习、字典汇编和下游国家地名方案是有用的。然而,由于冰冻地名录合用地展览的程度不同,这是一项具有挑战性的任务。在本文件中,我们提出了一个基于BERT的标记模型序列,并配以一个具有图形效应的变压器结构。我们评估了在上下文中确认合用地的任务。我们的结果表明,明确将模型结构中的综合依赖性编码是有助益的,并提供了英文、西班牙文和法文在同地名称识别方面的不同之处的见解。