Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.
翻译:情感分析是指识别和从文本中提取主观信息的过程。尽管跨语言方法在自动化方面取得了进展,但实施和评估情感分析系统需要特定语言的数据来考虑各种社会文化和语言的特殊性。本文描述了中央库尔德语情感分析数据集的收集和注释。我们探索了几种经典的机器学习和基于神经网络的技术。此外,我们采用了一种基于迁移学习的方法来利用预训练模型进行数据增强。我们证明了数据增强实现了高F$_1$分数和准确度,尽管任务难度较大。