Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals profound cross-linguistic inconsistencies, that arise from the lack of a clear linguistic and operational definition of what is a word, and that severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is anchored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MightyMorph, a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis. Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages. Furthermore, redefining morphology to the clause-level provides a neat interface with contextualized language models (LMs) and allows assessing the morphological knowledge encoded in these models and their usability for morphological tasks. Taken together, this work opens up new horizons in the study of computational morphology, leaving ample space for studying neural morphology cross-linguistically.
翻译:然而,对这些数据进行更仔细的检查后发现,在语言上和操作上缺乏对一个词的明确定义,从而导致语言上的深刻不一致,从而严重影响了衍生任务的普遍性。为了克服这一缺陷,我们提议将形态学视为一种条款级现象,而不是单词级现象。它以固定的、但包容性强的一套特征为基础,囊括了在饱和条款中实现的所有功能。我们提供了万象-Morph,这是一套涉及英文、德文、土耳其文和希伯来文这四种类型不同语言的条款性形态学的新数据集。我们使用该数据集来得出三种条款级形态学任务:模糊、重新定义和分析。我们的实验表明,条款级任务比各自的字级任务要困难得多,同时具有类似的不同语言复杂性。此外,重新定义到条款级的形态学提供了一种与本背景化语言模型的精确界面,从而可以使我们在空间形态学学学上进行新的形态学研究。