We present several neural networks to address the task of named entity recognition for morphologically complex languages (MCL). Kazakh is a morphologically complex language in which each root/stem can produce hundreds or thousands of variant word forms. This nature of the language could lead to a serious data sparsity problem, which may prevent the deep learning models from being well trained for under-resourced MCLs. In order to model the MCLs' words effectively, we introduce root and entity tag embedding plus tensor layer to the neural networks. The effects of those are significant for improving NER model performance of MCLs. The proposed models outperform state-of-the-art including character-based approaches, and can be potentially applied to other morphologically complex languages.
翻译:我们提出了若干神经网络,以解决名称实体对形态复杂语言的识别(MCL)任务。哈萨克语是一种形态复杂的语言,其中每个根/柱都可产生数百或数千种变异的单词形式。这种语言的性质可能导致严重的数据分散问题,从而可能妨碍深层次学习模式为资源不足的多功能语言进行良好培训。为了有效地模拟多功能语言,我们向神经网络引入根和实体标记嵌入沙丘。这些标志对于改进多功能语言网络的NER模型性能具有重大影响。提议的模型超越了包括基于性格的方法在内的最新状态,并有可能被用于其他形式复杂的语言。