俄文文本解毒方法 (Methods for Detoxification of Texts for the Russian Language) - 专知论文

会员服务 ·

0

Performer · MoDELS · GPT-2 · Processing（编程语言） · 无监督 ·

2021 年 5 月 19 日

Methods for Detoxification of Texts for the Russian Language

翻译：俄文文本解毒方法

Daryna Dementieva,Daniil Moskovskiy,Varvara Logacheva,David Dale,Olga Kozlova,Nikita Semenov,Alexander Panchenko

We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model - and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

翻译：我们首先研究俄罗斯文本的自动解毒,以打击攻击性语言。这种文字风格的转移可以用来处理社交媒体中的有毒内容。虽然在这方面已经为英语做了大量工作,但还没有解决俄语的问题。我们测试了两类模式,即基于BERT结构的不受监督的方法,即根据预先培训的语言GPT-2模式进行地方校正和监管方法,并与几个基线进行比较。此外,我们描述了评估机制,为自动评估提供培训数据集和衡量标准。结果显示,经过测试的方法可以成功地用于解毒,尽管还有改进的余地。

0

相关内容

Performer

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

专知会员服务

16+阅读 · 2019年12月22日

【博士论文】自然语言处理的神经图嵌入方法，Neural Graph Embedding methods for Natural Language Processing

【博士论文】自然语言处理的神经图嵌入方法，Neural Graph Embedding methods for Natural Language Processing

专知会员服务

80+阅读 · 2019年11月5日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

Arxiv

7+阅读 · 2019年10月21日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

专知会员服务

16+阅读 · 2019年12月22日

【博士论文】自然语言处理的神经图嵌入方法，Neural Graph Embedding methods for Natural Language Processing

【博士论文】自然语言处理的神经图嵌入方法，Neural Graph Embedding methods for Natural Language Processing

专知会员服务

80+阅读 · 2019年11月5日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

A Comparison of Neural Network Training Methods for Text Classification

Arxiv

6+阅读 · 2019年10月28日

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

Arxiv

7+阅读 · 2019年10月21日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员