This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks (CNN), long short-term memory (LSTM), gated recurrent units (GRU), their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.
翻译:这项工作详细比较了深层学习模型的性能,例如:进化神经网络(CNN)、长期短期记忆(LSTM)、封闭式经常性单元(GRU)、其混合体等,以及用于阿拉伯语情绪分析的浅度学习分类器;此外,比较还包括变压器结构和AraBERT预培训模型等最先进的模型;本研究中使用的数据集是多分阿拉伯酒店和书本审查数据集,这是阿拉伯审评中可公开获取的最大数据集之一;结果显示,与文献中报告的类似工作的结果相比,二进制和多标签分类的深度学习优于浅度学习;结果的差异是由于我们发现与深层学习模型的性能成正比的数据集大小造成的;深层和浅度学习技术的性能从准确性和F1分角度进行了分析;最佳的浅度学习技术是随机森林,由决定树和AdaBoost采用。深层学习模型同样使用默认嵌入层进行,而变压模型在与AraBER得到最佳增强时则进行。