Social media has penetrated into multilingual societies, however most of them use English to be a preferred language for communication. So it looks natural for them to mix their cultural language with English during conversations resulting in abundance of multilingual data, call this code-mixed data, available in todays' world.Downstream NLP tasks using such data is challenging due to the semantic nature of it being spread across multiple languages.One such Natural Language Processing task is sentiment analysis, for this we use an auto-regressive XLNet model to perform sentiment analysis on code-mixed Tamil-English and Malayalam-English datasets.
翻译:社交媒体已渗透到多语言社会, 但大部分社会媒体都使用英语作为沟通的首选语言。 因此,在对话期间,他们自然会将自己的文化语言与英语混为一谈, 从而产生大量多语言数据, 称之为今天世界上可用的代码混合数据。 使用这些数据的下流国家语言方案任务具有挑战性, 因为它在多种语言中传播的语义性质。 一种这样的自然语言处理任务就是情感分析, 因为我们使用自动回归的 XLNet 模式来对代码混合的泰米尔语英语和马来亚拉姆语英语数据集进行情感分析。