Although research on emotion classification has significantly progressed in high-resource languages, it is still infancy for resource-constrained languages like Bengali. However, unavailability of necessary language processing tools and deficiency of benchmark corpora makes the emotion classification task in Bengali more challenging and complicated. This work proposes a transformer-based technique to classify the Bengali text into one of the six basic emotions: anger, fear, disgust, sadness, joy, and surprise. A Bengali emotion corpus consists of 6243 texts is developed for the classification task. Experimentation carried out using various machine learning (LR, RF, MNB, SVM), deep neural networks (CNN, BiLSTM, CNN+BiLSTM) and transformer (Bangla-BERT, m-BERT, XLM-R) based approaches. Experimental outcomes indicate that XLM-R outdoes all other techniques by achieving the highest weighted $f_1$-score of $69.73\%$ on the test data. The dataset is publicly available at https://github.com/omar-sharif03/NAACL-SRW-2021.
翻译:虽然对高资源语言的情绪分类研究已取得重大进展,但对孟加拉语等资源受限制的语言而言,这种研究仍处于初级阶段,但是,缺乏必要的语言处理工具以及基准公司不足,使得孟加拉语中的情绪分类任务更具挑战性和复杂性。这项工作提议采用基于变压器的技术,将孟加拉语文本分为六种基本情感之一:愤怒、恐惧、厌恶、悲伤、喜悦和惊讶。为分类任务开发了一个孟加拉语情感库,由6243种文字组成。利用各种机器学习(LR、RF、MNB、SVM)、深神经网络(CNN、BilSTM、CNN+BILSTM)和变压器(Bangla-BERT、M-BERT、XLM-R)进行了实验结果显示,XLM-R通过在测试数据上达到最高加权$_1美元-核心69.73美元来超越所有其他技术。数据集公布于https://github.com/omar-sharif03/NAACL-SRW-2021。