The evolution of the Internet has increased the amount of information that is expressed by people on different platforms. This information can be product reviews, discussions on forums, or social media platforms. Accessibility of these opinions and peoples feelings open the door to opinion mining and sentiment analysis. As language and speech technologies become more advanced, many languages have been used and the best models have been obtained. However, due to linguistic diversity and lack of datasets, African languages have been left behind. In this study, by using the current state-of-the-art model, multilingual BERT, we perform sentiment classification on Swahili datasets. The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and the ISEAR emotion dataset. The data were classified as either positive or negative. The model was fine-tuned and achieve the best accuracy of 87.59%.
翻译:互联网的演变增加了人们在不同平台上表达的信息数量。这种信息可以是产品审查、论坛讨论或社交媒体平台。这些观点和人民的情感的可获取性打开了意见挖掘和情绪分析的大门。随着语言和语言技术的日益先进,许多语言已被使用,最佳模式也已经获得。然而,由于语言多样性和缺乏数据集,非洲语言被抛在后面。在这项研究中,我们使用目前最先进的模式多语言的BERT,对斯瓦希里语数据集进行情绪分类。这些数据是通过提取和注明对不同社会媒体平台和ISEAR情感数据集的8.2k评论和评论而生成的。数据被归类为正或负。该模型经过微调,实现了87.59%的最佳精确度。