With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.
翻译:Translated abstract:
随着社交媒体网站的迅速增长,自动获取用户反馈成为评估在线用户趋势和行为的关键任务。尽管信息大量可用,阿拉伯语用户数量的增加,但只有少数研究处理阿拉伯语方言。本文旨在研究Youtube评论中的真实摩洛哥方言文本表达的意见和情感,利用一些著名和常用的方法进行情感分析。在本文中,我们基于收集到的并手动注释的Youtube摩洛哥方言数据集,利用许多文本预处理和数据表示技术,采用常用的监督式分类器,例如K最近邻算法(KNN)、支持向量机算法(SVM)、朴素贝叶斯算法(NB)及深度学习模型(DL)分类器例如卷积神经网络(CNN)和长短期记忆模型(LTSM)来研究摩洛哥方言评论分类。使用原始数据和预处理后的数据进行实验,证明了预处理的重要性。事实上,实验结果表明,相较于传统方法,DL模型对于摩洛哥方言更具有优越性能,并且我们达到了90%的准确率。