A movie that is thoroughly enjoyed and recommended by an individual might be hated by another. One characteristic of humans is the ability to have feelings which could be positive or negative. To automatically classify and study human feelings, an aspect of natural language processing, sentiment analysis and opinion mining were designed to understand human feelings regarding several issues which could affect a product, a social media platforms, government, or societal discussions or even movies. Several works on sentiment analysis have been done on high resource languages while low resources languages like Yoruba have been sidelined. Due to the scarcity of datasets and linguistic architectures that will suit low resource languages, African languages "low resource languages" have been ignored and not fully explored. For this reason, our attention is placed on Yoruba to explore sentiment analysis on reviews of Nigerian movies. The data comprised 1500 movie reviews that were sourced from IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated. We develop sentiment classification models using the state-of-the-art pre-trained language models like mBERT and AfriBERTa to classify the movie reviews.
翻译:由个人全面享受和推荐的电影可能会受到另一个人的憎恶。 人类的一个特征是能够拥有可能积极或消极的情感。 要自动分类和研究人类感情,自然语言处理、情绪分析和见解挖掘的一个方面旨在了解人对可能影响产品、社交媒体平台、政府或社会讨论甚至电影的若干问题的感情。关于高资源语言的情绪分析的作品已经完成,而像约鲁巴这样的低资源语言被抛在一边。由于缺少适合低资源语言的数据集和语言结构,非洲语言“低资源语言”被忽略和没有充分探讨。为此原因,我们把注意力放在约鲁巴,以探讨尼日利亚电影审查的情绪分析。数据包括来自IMDB、Rotten Tomatoes、信箱、电影定位器和Nolrylating的1500个电影评论。我们利用MBERT和AfriBERTA等经过事先训练的状态语言模型来对电影评论进行分类。