In this study, we test transfer learning approach on Russian sentiment benchmark datasets using additional train sample created with distant supervision technique. We compare several variants of combining additional data with benchmark train samples. The best results were achieved using three-step approach of sequential training on general, thematic and original train samples. For most datasets, the results were improved by more than 3% to the current state-of-the-art methods. The BERT-NLI model treating sentiment classification problem as a natural language inference task reached the human level of sentiment analysis on one of the datasets.
翻译:在这项研究中,我们利用通过远程监督技术制作的额外火车样本测试俄罗斯情绪基准数据集的转让学习方法;我们比较了将额外数据与基准火车样本相结合的若干变式;最佳成果是采用一般、专题和原始火车样本连续培训的三步方法取得的;对于大多数数据集而言,结果比目前的最新方法改进了3%以上;BERT-NLI模型将情绪分类问题作为自然语言推断任务处理,达到了对其中一组数据集的人类情绪分析水平。