解毒语言模式的挑战 (Challenges in Detoxifying Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 可约的 · Nuance · BASIC ·

2021 年 9 月 15 日

Challenges in Detoxifying Language Models

翻译：解毒语言模式的挑战

Johannes Welbl,Amelia Glaese,Jonathan Uesato,Sumanth Dathathri,John Mellor,Lisa Anne Hendricks,Kirsty Anderson,Pushmeet Kohli,Ben Coppin,Po-Sen Huang

from arxiv, 23 pages, 6 figures, published in Findings of EMNLP 2021

Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies with respect to both automatic and human evaluation, and analyze consequences of toxicity mitigation in terms of model bias and LM quality. We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions -- highlighting further the nuances involved in careful evaluation of LM toxicity.

翻译：大型语言模型(LM)产生非常流畅的文本,并可在NLP任务中有效调整。测量和保证生成的文本在安全性方面的质量对于在现实世界中部署LM项目至关重要;为此,先前的工作往往依赖于对LM毒性的自动评估。我们严格地讨论这一方法,评价自动和人体评估方面的若干减轻毒性战略,从模型偏差和LM质量方面分析减轻毒性的后果。我们证明,虽然基本干预战略可以有效地优化先前为RealToxicityPrompts数据集制定的自动衡量标准,但这要以降低边缘化群体文本和方言的LM覆盖面为代价。此外,我们发现,在大力减少毒性干预措施之后,人类比率者往往不同意高的自动毒性分数 -- -- 进一步强调仔细评估LM毒性所涉及的细微差别。

0

相关内容

语言模型化

语言模型化

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【北京智源大会2019】AV+AI -挑战和机遇（AV+AI - Challenges and Opportunities），伯克利DeepDrive副主任詹景尧

【北京智源大会2019】AV+AI -挑战和机遇（AV+AI - Challenges and Opportunities），伯克利DeepDrive副主任詹景尧

专知会员服务

14+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Multi-Connectivity in Mobile Networks: Challenges and Benefits

Arxiv

0+阅读 · 2021年11月4日

Teach Me to Explain: A Review of Datasets for Explainable NLP

Teach Me to Explain: A Review of Datasets for Explainable NLP

Arxiv

0+阅读 · 2021年11月4日

Response Generation with Context-Aware Prompt Learning

Arxiv

0+阅读 · 2021年11月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey

Arxiv

3+阅读 · 2020年7月3日

Recent Advances and Challenges in Task-oriented Dialog System

Recent Advances and Challenges in Task-oriented Dialog System

Arxiv

19+阅读 · 2020年3月19日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Language GANs Falling Short

Arxiv

7+阅读 · 2018年11月6日

Natural Language Processing: State of The Art, Current Trends and Challenges

Arxiv

4+阅读 · 2017年8月17日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【北京智源大会2019】AV+AI -挑战和机遇（AV+AI - Challenges and Opportunities），伯克利DeepDrive副主任詹景尧

【北京智源大会2019】AV+AI -挑战和机遇（AV+AI - Challenges and Opportunities），伯克利DeepDrive副主任詹景尧

专知会员服务

14+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Multi-Connectivity in Mobile Networks: Challenges and Benefits

Arxiv

0+阅读 · 2021年11月4日

Teach Me to Explain: A Review of Datasets for Explainable NLP

Teach Me to Explain: A Review of Datasets for Explainable NLP

Arxiv

0+阅读 · 2021年11月4日

Response Generation with Context-Aware Prompt Learning

Arxiv

0+阅读 · 2021年11月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey

Arxiv

3+阅读 · 2020年7月3日

Recent Advances and Challenges in Task-oriented Dialog System

Recent Advances and Challenges in Task-oriented Dialog System

Arxiv

19+阅读 · 2020年3月19日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Language GANs Falling Short

Arxiv

7+阅读 · 2018年11月6日

Natural Language Processing: State of The Art, Current Trends and Challenges

Arxiv

4+阅读 · 2017年8月17日

微信扫码咨询专知VIP会员