The performance of sentiment analysis methods has greatly increased in recent years. This is due to the use of various models based on the Transformer architecture, in particular BERT. However, deep neural network models are difficult to train and poorly interpretable. An alternative approach is rule-based methods using sentiment lexicons. They are fast, require no training, and are well interpreted. But recently, due to the widespread use of deep learning, lexicon-based methods have receded into the background. The purpose of the article is to study the performance of the SO-CAL and SentiStrength lexicon-based methods, adapted for the Russian language. We have tested these methods, as well as the RuBERT neural network model, on 16 text corpora and have analyzed their results. RuBERT outperforms both lexicon-based methods on average, but SO-CAL surpasses RuBERT for four corpora out of 16.
翻译:近年来,情绪分析方法的性能大大提高了,这是因为使用了基于变异器结构的各种模型,特别是BERT。然而,深神经网络模型难以培训,也难以解释。另一种办法是采用基于规则的方法,使用情绪立体法。这些方法很快,不需要培训,而且得到很好的解释。但最近,由于广泛使用深层学习,基于词汇的方法逐渐退到背景中。文章的目的是研究SO-CAL和SentiStrength基于词汇法的方法的性能,这些方法已适应了俄语。我们已经测试了这些方法,以及RuBERT神经网络模型,使用16种文字立体,并分析了其结果。RuBERT在平均情况下超越了两种基于词汇的方法,但SO-CAL在16个公司中的4个公司超过了RuBERT。我们测试了这些方法以及RuBERT神经网络模型,并分析了其结果。RuBERT在16个公司中的4个公司中超过了SO-CAL。