This work presents a morphological analyzer for the Uzbek language using a finite state machine. The proposed methodology is a morphologic analysis of Uzbek words by using an affix striping to find a root and without including any lexicon. This method helps to perform morphological analysis of words from a large amount of text at high speed as well as it is not required using of memory for keeping vocabulary. According to Uzbek, an agglutinative language can be designed with finite state machines (FSMs). In contrast to the previous works, this study modeled the completed FSMs for all word classes by using the Uzbek language's morphotactic rules in right to left order. This paper shows the stages of this methodology including the classification of the affixes, the generation of the FSMs for each affix class, and the combination into a head machine to make analysis a word.
翻译:这项工作为乌兹别克语提供了一个使用限定状态机器的形态分析器。 提议的方法是使用一个折叠条以寻找根, 而不包含任何词汇, 对乌兹别克语进行形态分析。 此方法有助于以高速度对大量文字中的单词进行形态分析, 而不需要用记忆来保存词汇。 乌兹别克认为, 可以用有限的状态机器来设计一种混合语言 。 与先前的工程不同, 本研究用乌兹别克语的左顺序定型规则来模拟所有单词类已完成的FSMs。 本文展示了这一方法的各个阶段, 包括折叠的分类, 每个折叠类的FSMs的生成, 以及用于分析单词的组合为首机 。