培训前的附加说明:控制语言模式中毒性的有效途径 (Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models) - 专知论文

会员服务 ·

0

语言模型化 · 可约的 · MoDELS · 控制器 · 得分 ·

2023 年 2 月 14 日

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

翻译：培训前的附加说明:控制语言模式中毒性的有效途径

Shrimai Prabhumoye,Mostofa Patwary,Mohammad Shoeybi,Bryan Catanzaro

from arxiv, This paper will be presented at EACL 2023

Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters.

翻译：受过训练的大型语言模型对于解决各种自然语言处理(NLP)任务已经变得不可或缺。然而,在现实世界应用中安全地部署这些模型具有挑战性,因为它们产生有毒内容。为了应对这一挑战,我们提出了两项新的培训前数据增强战略,在无损其效用的情况下大幅降低模型毒性。我们的两个战略是:(1) MEDA:将原始毒性分数作为元数据添加到培训前样本中,(2) INST:给这些样本增加说明其毒性的指示。我们的结果表明,我们的最佳执行战略(INST)将毒性概率大幅降低至61%,同时保持5项基准国家语言处理任务的准确性,并将澳大利亚大学四项偏差检测任务的得分提高1.3%。我们还通过扩大培训样本的数量和模型参数的数量,展示了我们技术的通用性。

0

相关内容

语言模型化

语言模型化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数空间上的Toeplitz型奇异积分算子

国家自然科学基金

0+阅读 · 2014年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

微分自治系统几类多重奇点的极限环分支与共振中心

国家自然科学基金

0+阅读 · 2012年12月31日

自旋轨道耦合超冷费米原子气体

国家自然科学基金

0+阅读 · 2012年12月31日

薛定谔方程中的稳定现象

国家自然科学基金

0+阅读 · 2012年12月31日

SIRT1在酒精致糖尿病发病中机制及白藜芦醇干预

国家自然科学基金

0+阅读 · 2011年12月31日

抗MPB64-McAb INH、RFP聚乳酸纳米粒靶向治疗脊柱结核的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

元激发的超快动力学与非线性光谱

国家自然科学基金

0+阅读 · 2009年12月31日

A Visual Active Search Framework for Geospatial Exploration

Arxiv

0+阅读 · 2023年4月5日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

5+阅读 · 2023年4月4日

Polytuplet Loss: A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models

Arxiv

0+阅读 · 2023年4月3日

When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective!

Arxiv

0+阅读 · 2023年4月3日

A Guide for Practical Use of ADMG Causal Data Augmentation

Arxiv

0+阅读 · 2023年4月3日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A Visual Active Search Framework for Geospatial Exploration

Arxiv

0+阅读 · 2023年4月5日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

5+阅读 · 2023年4月4日

Polytuplet Loss: A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models

Arxiv

0+阅读 · 2023年4月3日

When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective!

Arxiv

0+阅读 · 2023年4月3日

A Guide for Practical Use of ADMG Causal Data Augmentation

Arxiv

0+阅读 · 2023年4月3日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数空间上的Toeplitz型奇异积分算子

国家自然科学基金

0+阅读 · 2014年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

微分自治系统几类多重奇点的极限环分支与共振中心

国家自然科学基金

0+阅读 · 2012年12月31日

自旋轨道耦合超冷费米原子气体

国家自然科学基金

0+阅读 · 2012年12月31日

薛定谔方程中的稳定现象

国家自然科学基金

0+阅读 · 2012年12月31日

SIRT1在酒精致糖尿病发病中机制及白藜芦醇干预

国家自然科学基金

0+阅读 · 2011年12月31日

抗MPB64-McAb INH、RFP聚乳酸纳米粒靶向治疗脊柱结核的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

元激发的超快动力学与非线性光谱

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员