Code-switching is a common phenomenon among people with diverse lingual background and is widely used on the internet for communication purposes. In this paper, we present a Recurrent Neural Network combined with the Attention Model for Language Identification in Code-Switched Data in English and low resource Roman Urdu. The attention model enables the architecture to learn the important features of the languages hence classifying the code switched data. We demonstrated our approach by comparing the results with state of the art models i.e. Hidden Markov Models, Conditional Random Field and Bidirectional LSTM. The models evaluation, using confusion matrix metrics, showed that the attention mechanism provides improved the precision and accuracy as compared to the other models.
翻译:代码转换是具有多种语言背景的人的一种常见现象,在互联网上广泛用于通信目的。在本文中,我们用英语和低资源罗曼乌尔都语推出了一个经常性神经网络,并结合了代码转换数据中语言识别注意模型。关注模型使结构能够了解语言的重要特征,从而对代码转换数据进行分类。我们展示了我们的方法,将结果与艺术模型(即隐藏的马尔科夫模型、有条件随机场和双向LSTM)的状况进行比较。模型评估利用混乱矩阵指标显示,关注机制与其他模型相比提高了准确性和准确性。