Due to the breathtaking growth of social media or newspaper user comments, online product reviews comments, sentiment analysis (SA) has captured substantial interest from the researchers. With the fast increase of domain, SA work aims not only to predict the sentiment of a sentence or document but also to give the necessary detail on different aspects of the sentence or document (i.e. aspect-based sentiment analysis). A considerable number of datasets for SA and aspect-based sentiment analysis (ABSA) have been made available for English and other well-known European languages. In this paper, we present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals. In addition, we conducted a baseline evaluation with a focus on deep learning model, achieved an accuracy of 78.75% for aspect term extraction and accuracy of 71.08% for sentiment classification. Experiments on the BAN-ABSA dataset show that the CNN model is better in terms of accuracy though Bi-LSTM significantly outperforms CNN model in terms of average F1-score.
翻译:由于社交媒体或报纸用户评论的惊人增长,在线产品审查评论、情绪分析(SA)引起了研究人员的极大兴趣。随着域数的迅速增加,SA的工作目的不仅在于预测判决或文件的情绪,而且还要对判决或文件的不同方面提供必要的细节(即基于方方面面的情绪分析),为SA和基于方方面面的情绪分析提供了大量数据集(ABSA),供英语和其他欧洲著名语言使用。在本文中,我们提供了一套附有注释的高质量孟加拉语数据,即BAN-ABSA, 上面附有三个孟加拉土著孟加拉语发言者的方面和相关情绪的说明。数据集包括从一些著名的孟加拉语新闻门户收集的9 009个独特的评论中2 619个正数、4 721个负数和1 669个中性数据样本。此外,我们进行了基线评估,重点是深度学习模型,在71.08%的术语提取和准确度方面实现了78.75%的准确度,用于情绪分类。在BAN-ABSA数据设置模型上进行的实验表明,CNNIS模型的准确性优于BITM。