The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language processing applications. Word embedding, or word distributing approach, is a current and powerful tool to capture together the closest words from a contextual text. In this paper, we describe how we construct Word2Vec models from a large Arabic corpus obtained from ten newspapers in different Arab countries. By applying different machine learning algorithms and convolutional neural networks with different text feature selections, we report improved accuracy of sentiment classification (91%-95%) on our publicly available Arabic language health sentiment dataset [1]. Keywords - Arabic Sentiment Analysis, Machine Learning, Convolutional Neural Networks, Word Embedding, Word2Vec for Arabic, Lexicon.
翻译:阿拉伯语在形态学、正文学和方言中的复杂性使得对阿拉伯语的情绪分析更具挑战性。此外,从短信息中提取像推文这样的文字特征,以测量情绪,使得这项任务更加困难。近年来,深神经网络经常被使用,在情绪分类和自然语言处理应用程序中显示出非常好的结果。单词嵌入或字分配方法,是当前从上下文文本中收集最接近的单词的有力工具。在本文中,我们描述了我们如何从不同阿拉伯国家的十家报纸获得的大型阿拉伯文体中构建Word2Vec模型。通过应用不同的机器学习算法和具有不同文本特征选择的进化神经网络,我们报告在公开提供的阿拉伯语健康感知数据集中提高了情绪分类的准确性(91%-95% )。 [1. 关键词 - 阿拉伯语感应分析、机器学习、进化神经网络、Word Embedding、Word2Vec用于阿拉伯语、Lexicicon。