We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model - and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
翻译:我们首先研究俄罗斯文本的自动解毒,以打击攻击性语言。这种文字风格的转移可以用来处理社交媒体中的有毒内容。虽然在这方面已经为英语做了大量工作,但还没有解决俄语的问题。我们测试了两类模式,即基于BERT结构的不受监督的方法,即根据预先培训的语言GPT-2模式进行地方校正和监管方法,并与几个基线进行比较。此外,我们描述了评估机制,为自动评估提供培训数据集和衡量标准。结果显示,经过测试的方法可以成功地用于解毒,尽管还有改进的余地。