We introduce MorphNet, a single model that combines morphological analysis and disambiguation. Traditionally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence-to-sequence recurrent neural network to combine analysis and disambiguation. We show that when trained with text labeled with correct morphological analyses, MorphNet obtains state-of-the art or comparable results for nine different datasets in seven different languages.
翻译:我们引入了MorphNet, 这是一种将形态分析与脱钩相结合的单一模型。传统上,对形态复杂语言的分析分两个阶段进行:(一) 以有限状态传感器为基础的形态分析师对一个单词进行所有可能的形态分析;(二) 统计模糊模型根据每个单词的上下文选择正确的分析。MorphNet使用从顺序到序列的经常性神经网络将分析和脱裂结合起来。我们表明,在接受以正确的形态分析为标签的文本培训时,MorphNet获得七种不同语言的九个不同数据集的最新或可比结果。