We present work on sentiment analysis in Twitter for Macedonian. As this is pioneering work for this combination of language and genre, we created suitable resources for training and evaluating a system for sentiment analysis of Macedonian tweets. In particular, we developed a corpus of tweets annotated with tweet-level sentiment polarity (positive, negative, and neutral), as well as with phrase-level sentiment, which we made freely available for research purposes. We further bootstrapped several large-scale sentiment lexicons for Macedonian, motivated by previous work for English. The impact of several different pre-processing steps as well as of various features is shown in experiments that represent the first attempt to build a system for sentiment analysis in Twitter for the morphologically rich Macedonian language. Overall, our experimental results show an F1-score of 92.16, which is very strong and is on par with the best results for English, which were achieved in recent SemEval competitions.
翻译:我们在马其顿的Twitter上介绍情绪分析工作。这是马其顿语言和类型相结合的开创性工作,因此,我们创造了适当的资源,用于培训和评价马其顿语推文的情绪分析系统。特别是,我们开发了一套带有推文情绪极度的推文加注(积极、消极和中性)以及语句级的推文,我们为研究目的免费提供了这些内容。我们还在以往英语工作推动下,为马其顿人开发了数种大规模情绪词汇。几个不同的预处理步骤和各种特征的影响表现在实验中,这是首次尝试在Twitter上为形态丰富的马其顿语建立情绪分析系统的尝试。总体而言,我们的实验结果表明,F1分数为92.16,非常有力,与最近的SemEval竞赛中取得的英语最佳结果相同。