Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.
翻译:感官分析是一项广泛研究的NLP任务,目标是确定用户对他们所审查的产品、实体或服务的看法、情感和评价。情绪分析的最大挑战之一是它高度依赖语言。单词嵌入、情绪词典、甚至附加说明的数据是特定语言。此外,每种语言的优化模式耗时耗时和劳动密集型,特别是经常性神经网络模型。从资源角度看,收集不同语言的数据非常具有挑战性。在本文中,我们寻求对以下研究问题的答案:是否能够将受过语言培训的情绪分析模型再用于以其他语言(俄语、西班牙语、土耳其语和荷兰语)进行情绪分析,而数据则比较有限?我们的目标是用任务可用的最大数据集建立一个单一模型,再用于资源有限的语言。为此,我们用英语的经常性神经网络来培训一种情绪分析模型。我们然后将评论翻译成其他语言,再利用这一模型来评估情绪。实验结果显示,我们用不同统计基线所培训的单一模型的可靠方法,以不同统计基准形式进行。