Hypernym and synonym matching are one of the mainstream Natural Language Processing (NLP) tasks. In this paper, we present systems that attempt to solve this problem. We designed these systems to participate in the FinSim-3, a shared task of FinNLP workshop at IJCAI-2021. The shared task is focused on solving this problem for the financial domain. We experimented with various transformer based pre-trained embeddings by fine-tuning these for either classification or phrase similarity tasks. We also augmented the provided dataset with abbreviations derived from prospectus provided by the organizers and definitions of the financial terms from DBpedia [Auer et al., 2007], Investopedia, and the Financial Industry Business Ontology (FIBO). Our best performing system uses both FinBERT [Araci, 2019] and data augmentation from the afore-mentioned sources. We observed that term expansion using data augmentation in conjunction with semantic similarity is beneficial for this task and could be useful for the other tasks that deal with short phrases. Our best performing model (Accuracy: 0.917, Rank: 1.156) was developed by fine-tuning SentenceBERT [Reimers et al., 2019] (with FinBERT at the backend) over an extended labelled set created using the hierarchy of labels present in FIBO.
翻译:超尼和同义词匹配是主流自然语言处理(NLP)任务之一。在本文中,我们介绍了试图解决这一问题的系统。我们设计这些系统是为了参加FinSim-3,这是在IJCAI-2021举办的FinNLP讲习班的一项共同任务。我们共同的任务侧重于解决金融领域的这一问题。我们试验了各种基于变压器的预先训练嵌入器,对这些数据进行微调,对分类或类似用语的任务加以微调。我们还用由DBpedia[Auer等人,2007年]、Investeridia和FIFIBO等组织者提供的关于财务术语和定义的缩略图提供的数据集。我们的最佳执行系统使用FinBERT[Araci, 2019] 和来自上述来源的数据增强。我们发现,利用数据增强和语义相似性来改进数据,对处理短句的其他任务有用。我们的最佳执行模式(Aprecureacy:0.917, Instropretiedia, Instrained Austriforal-Reduforal Supal-SUnistration)在SUnistrual-SUniferview Sy-B 上建立了Sy-Stary-BIreal-Starxlals。我们在Servial-B 和Starxildal-B 1.156的标签的标签制成了SUBABABAB的升级。我们。我们。我们。我们。我们B。我们。我们。我们。我们制制制了SB。我们制了SB。我们。我们的标签的标签。我们制了SBB。我们制了SB。我们制了SB。