使用 ParsBERT 和数据处理方法不平衡检测波斯情感 (Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches)

Emotion recognition is one of the machine learning applications which can be done using text, speech, or image data gathered from social media spaces. Detecting emotion can help us in different fields, including opinion mining. With the spread of social media, different platforms like Twitter have become data sources, and the language used in these platforms is informal, making the emotion detection task difficult. EmoPars and ArmanEmo are two new human-labeled emotion datasets for the Persian language. These datasets, especially EmoPars, are suffering from inequality between several samples between two classes. In this paper, we evaluate EmoPars and compare them with ArmanEmo. Throughout this analysis, we use data augmentation techniques, data re-sampling, and class-weights with Transformer-based Pretrained Language Models(PLMs) to handle the imbalance problem of these datasets. Moreover, feature selection is used to enhance the models' performance by emphasizing the text's specific features. In addition, we provide a new policy for selecting data from EmoPars, which selects the high-confidence samples; as a result, the model does not see samples that do not have specific emotion during training. Our model reaches a Macro-averaged F1-score of 0.81 and 0.76 on ArmanEmo and EmoPars, respectively, which are new state-of-the-art results in these benchmarks.

翻译：情感识别是使用从社交媒体空间收集的文本、语音或图像数据可以完成的机器学习应用。检测情感可以帮助我们在不同领域, 包括见解挖掘。随着社交媒体的传播, 诸如Twitter等不同平台已成为数据源, 这些平台所使用的语言是非正式的, 使得情感检测任务难上。 EmoPars 和 ArmanEmo是波斯语的两套新的人类标签情感数据集。这些数据集, 特别是 EmoPars, 正在遭受两个班级之间不同样本之间的不平等。在本文中, 我们评估EmoParrs 并把它们与 ArmanEmo 相比。在整个分析中, 我们使用数据增强技术、数据再采样和类比技术以及基于变异器的预设语言模型( PLMs) 来处理这些数据集的不平衡问题。此外, 使用地貌选择功能来提高模型的性能, 特别是 EmoPars, 我们提供了从选择高度自信样本的EmoPrs 中选择数据的新政策; 作为结果, 模型不会在F- MA1 的模型中看到具体的样本, 和 Arma- Paseral1 的样本在培训中, 中, SAI- seral 都没有达到特定的样本。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/