根据Veps语和Karelian语的形态词典,对未知词进行部分言语和克格米特标记算法的一部分 (Part of speech and gramset tagging algorithms for unknown words based on morphological dictionaries of the Veps and Karelian languages)

This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis that words with the same suffixes are likely to have the same inflectional models, the same part of speech and gramset. The accuracy of these algorithms were evaluated and compared. 313 thousand Vepsian and 66 thousand Karelian words were used to verify the accuracy of these algorithms. The special functions were designed to assess the quality of results of the developed algorithms. 92.4% of Vepsian words and 86.8% of Karelian words were assigned a correct part of speech by the developed algorithm. 95.3% of Vepsian words and 90.7% of Karelian words were assigned a correct gramset by our algorithm. Morphological and semantic tagging of texts, which are closely related and inseparable in our corpus processes, are described in the paper.

翻译：用于低资源 Veps 和 Karelian 语言的研究。文章中展示了用于将部分语言和语法属性的语音标记用于文字和语法属性的部分语言标记的分类。这些算法使用我们的形态词典, 边际词典, 边际词典和一套语法特征( 语法集), 边际词典( 语法集) 以每个单词形式著称。这些算法基于类推假设, 同一后缀的单词可能具有相同的反动模型, 相同的语法和语法部分。这些算法的准确性得到了评估和比较。 313 000 Vepsian 和 66 000 Karelian 字典被用于核实这些算法的准确性。这些特殊功能旨在评估所开发的算法的质量。 92.4% 边际词典和86.8%的Karelian 字典( 语系) 被发达的算法赋予了正确的语言部分。我们的算法为95. 3% 的Vepsian 字典和90. 的 Karelian 字典的90.7%的字典配了一个正确的克。我们的文的文体和文体的字典的字典是密切相关的。

相关内容

词性标注

关注 389

词性（part-of-speech）是词汇基本的语法属性，通常也称为词类。词性标注就是在给定句子中判定每个词的语法范畴，确定其词性并加以标注的过程，是中文信息处理面临的重要基础性问题。在语料库语言学中，词性标注（POS标注或PoS标注或POST），也称为语法标注，是将文本（语料库）中的单词标注为与特定词性相对应的过程，[1] 基于其定义和上下文。