无国界道德:条款层次的道德学 (Morphology Without Borders: Clause-Level Morphology)

Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals profound cross-linguistic inconsistencies, that arise from the lack of a clear linguistic and operational definition of what is a word, and that severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is anchored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MightyMorph, a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis. Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages. Furthermore, redefining morphology to the clause-level provides a neat interface with contextualized language models (LMs) and allows assessing the morphological knowledge encoded in these models and their usability for morphological tasks. Taken together, this work opens up new horizons in the study of computational morphology, leaving ample space for studying neural morphology cross-linguistically.

翻译：然而,对这些数据进行更仔细的检查后发现,在语言上和操作上缺乏对一个词的明确定义,从而导致语言上的深刻不一致,从而严重影响了衍生任务的普遍性。为了克服这一缺陷,我们提议将形态学视为一种条款级现象,而不是单词级现象。它以固定的、但包容性强的一套特征为基础,囊括了在饱和条款中实现的所有功能。我们提供了万象-Morph,这是一套涉及英文、德文、土耳其文和希伯来文这四种类型不同语言的条款性形态学的新数据集。我们使用该数据集来得出三种条款级形态学任务:模糊、重新定义和分析。我们的实验表明,条款级任务比各自的字级任务要困难得多,同时具有类似的不同语言复杂性。此外,重新定义到条款级的形态学提供了一种与本背景化语言模型的精确界面,从而可以使我们在空间形态学学学上进行新的形态学研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日