使用自控变形器的有毒文字的民事改写 (Civil Rephrases Of Toxic Texts With Self-Supervised Transformers)

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

翻译：支持在线评论的平台,从社交网络到新闻网站,正在越来越多地利用机器学习来协助其温和努力。但这一过程通常不会向作者提供反馈,帮助作者根据社区准则作出贡献。这对于人类主持人来说耗时太长,而且计算方法仍然新生。这项工作侧重于能够帮助以更文明的方式提出有毒评论的修改的模型。受最近未设序序至顺序任务的进展的启发,引入了一种自我监督的学习模式,称为CAE- T5. CAE- T5, 使用预先训练的文本到文本变换器,该变换器与消毒和循环自动编码损失相适应。实验最大的毒性检测数据( 公民评论), 我们的模式生成的句子比早期的文本样式转换系统更流利,更能保存初始内容,我们用几种评分系统和人类评价进行比较。

相关内容

MoDELS

关注 40

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【KDD2020教程】多模态网络表示学习

专知会员服务

130+阅读 · 2020年8月26日

因果图，Causal Graphs，52页ppt

专知会员服务

248+阅读 · 2020年4月19日