We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.
翻译:我们建议采用基于LSTM的模型,以等级结构为基础,从代码转换的推特数据中确定名称实体的识别。我们的模型使用双语字符表达和传输学习来解决词汇外的字词。为了减少数据噪音,我们建议使用象征性替代和正常化。在第三次语言编码转换共同任务计算方法研讨会上,我们实现了第二位,英语和西班牙语对口语言的62.76%的调和平均F1分数,而没有使用任何地名录和知识信息。