The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.The code and datasets are publicly available at https://github.com/ashishgupta2598/SaCTI
翻译:复制现象在梵语中普遍存在,有助于在表达思想方面实现简洁化,同时丰富语言的词汇和结构形成。在这项工作中,我们侧重于梵语化合物类型识别(SaCTI)任务,我们考虑复制词各组成部分间语义关系的问题。早期方法完全依赖从各组成部分获得的词汇信息,忽视了对萨科蒂最关键的背景和综合信息。然而,萨科蒂的任务主要具有挑战性,因为各复合组成部分之间隐含的编码背景敏感语义关系。因此,我们提出一个新的多任务学习结构,将背景信息纳入其中,并丰富互补的合成信息,将形态标记和依赖划分作为两项辅助任务。萨科蒂的基准数据集实验显示6.1点(准确性)和7.7点(F1-核心),与州-艺术系统相比,我们多语言实验展示了英/马萨基/马萨基区域拟议架构的效用。