由数据驱动的数据驱动的跨语言、低资源、低资源、道德分类的通用性模型 (Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation)

Common designs of model evaluation typically focus on monolingual settings, where different models are compared according to their performance on a single data set that is assumed to be representative of all possible data for the task at hand. While this may be reasonable for a large data set, this assumption is difficult to maintain in low-resource scenarios, where artifacts of the data collection can yield data sets that are outliers, potentially making conclusions about model performance coincidental. To address these concerns, we investigate model generalizability in crosslinguistic low-resource scenarios. Using morphological segmentation as the test case, we compare three broad classes of models with different parameterizations, taking data from 11 languages across 6 language families. In each experimental setting, we evaluate all models on a first data set, then examine their performance consistency when introducing new randomly sampled data sets with the same size and when applying the trained models to unseen test sets of varying sizes. The results demonstrate that the extent of model generalization depends on the characteristics of the data set, and does not necessarily rely heavily on the data set size. Among the characteristics that we studied, the ratio of morpheme overlap and that of the average number of morphemes per word between the training and test sets are the two most prominent factors. Our findings suggest that future work should adopt random sampling to construct data sets with different sizes in order to make more responsible claims about model evaluation.

翻译：模型评价的共同设计通常侧重于单一语言环境,不同模型根据其在假定代表手头任务的所有可能数据的单一数据集的性能,对不同的模型进行不同的比较;虽然对大型数据集来说,这也许是合理的,但在低资源情景中,这种假设很难维持,因为数据收集的手工艺品能够产生出出出出出处的数据集,有可能对模型性能同时期作出结论;为解决这些关切,我们调查了跨语言低资源情景中的典型通用性。在测试中,我们用形态分化来比较三大类模型和不同参数化的数据,取自来自6种语言组的11种语言的数据。在每次试验中,我们评估所有模型在第一个数据集上的所有模型,然后在采用大小相同的随机抽样新数据集时,以及在将经过训练的模型应用于不同规模的无形测试数据集时,检查其性能是否一致。结果表明,模型性一般化的程度取决于数据集的特性,而不一定大量依赖数据集的规模。在我们研究的特征中,关于标准重叠的比重比例是来自6种语言组的11种语言组的数据。在每次实验中,然后在采用新的随机抽样序列中,我们对每个样本进行最有代表性的抽样的测测定的数值。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日