Previous Part-Of-Speech (POS) induction models usually assume certain independence assumptions (e.g., Markov, unidirectional, local dependency) that do not hold in real languages. For example, the subject-verb agreement can be both long-term and bidirectional. To facilitate flexible dependency modeling, we propose a Masked Part-of-Speech Model (MPoSM), inspired by the recent success of Masked Language Models (MLM). MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction. We achieve competitive results on both the English Penn WSJ dataset as well as the universal treebank containing 10 diverse languages. Though modeling the long-term dependency should ideally help this task, our ablation study shows mixed trends in different languages. To better understand this phenomenon, we design a novel synthetic experiment that can specifically diagnose the model's ability to learn tag agreement. Surprisingly, we find that even strong baselines fail to solve this problem consistently in a very simplified setting: the agreement between adjacent words. Nonetheless, MPoSM achieves overall better performance. Lastly, we conduct a detailed error analysis to shed light on other remaining challenges. Our code is available at https://github.com/owenzx/MPoSM
翻译:上一个部分(POS)上岗模型通常假定某些不以实际语言持有的独立假设(如Markov、单向、本地依赖性),例如,主题动词协议既可以是长期协议,也可以是双向协议。为了便利灵活的依赖模式模式,我们提议了一个蒙面部分(MPOS)模型,这是受最近蒙面语言模型(MLMM)的成功启发的启发。MPosM可以将任意标签依赖性模式建模,并通过蒙面POS重建的目标执行POS上岗模型。我们在英语Penn WSJ数据集以及包含10种不同语言的普遍树库上取得了竞争性结果。尽管长期依赖性协议的建模应该有助于完成这项任务,但我们的消化研究显示了不同语言的混合趋势。为了更好地理解这一现象,我们设计了一个新的合成实验,可以具体判断模型学习标签协议的能力。令人惊讶的是,我们发现甚至强大的基准都无法在一个非常简化的设置中始终解决这个问题:相邻词之间的协议。然而,MPSMSMSM(MO/MO)仍然有更精确地分析。