具有人类偏好的语言模式 (Pretraining Language Models with Human Preferences) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Learning · Prompt · 可辨认的 ·

2023 年 2 月 16 日

Pretraining Language Models with Human Preferences

翻译：具有人类偏好的语言模式

Tomasz Korbak,Kejian Shi,Angelica Chen,Rasika Bhalerao,Christopher L. Buckley,Jason Phang,Samuel R. Bowman,Ethan Perez

Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.

翻译：语言模型(LMS)在模仿互联网文本方面有先入之见,包括如果由LM产生就会违反人类偏好的内容:错误、冒犯性评论、个人识别信息、低质量或错误代码等等。在这里,我们探索了培训LMS的替代目标,以同样的方式指导他们产生符合人类喜好的案文。我们用人类反馈为预培训设定了五个目标,分三项任务,并研究它们如何影响经过预先培训的LM的调整和能力之间的取舍。我们发现,在我们所探讨的人群中,一种最优和简单的方法是:有条件的培训,或学习比以奖励模式给人的偏好分为条件的标志的分布。有条件培训将不受欢迎的内容降低到一个数量级,在没有及时生成和有对抗性倾向的文本时,既能指导他们产生与人类喜好一致的文本。此外,有条件的培训保持标准的LM培训前期的下游工作业绩,无论是在具体任务调整之前还是之后。我们发现,与人类反馈前期培训相比,其满意程度要好得多,然后对反馈进行微调,即学习,然后从学习和不学习不可取的人类的学习。我们建议,在学习模式中开始学习后开始学习。

0

相关内容

语言模型化

语言模型化

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

氮化硼纳米管增韧陶瓷刀具及其高效切削性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Windows 7 操作系统的安全性分析

国家自然科学基金

0+阅读 · 2011年12月31日

氧化应激诱导的G2/M期阻滞中HSP90对26S蛋白酶体的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Vitamin E脂质体纳米颗粒携带siRNA靶向抑制HCV的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

椭圆曲线密码学算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Arxiv

0+阅读 · 2023年4月10日

Training Language Models with Language Feedback at Scale

Arxiv

0+阅读 · 2023年4月9日

Probing Conceptual Understanding of Large Visual-Language Models

Arxiv

1+阅读 · 2023年4月7日

CAPOT: Creating Robust Dense Query Encoders using Post Training Contrastive Alignment

Arxiv

0+阅读 · 2023年4月6日

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Arxiv

0+阅读 · 2023年4月4日

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Arxiv

0+阅读 · 2023年4月4日

On the Feasibility of Specialized Ability Extracting for Large Language Code Models

Arxiv

0+阅读 · 2023年4月4日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

Questions Are All You Need to Train a Dense Passage Retriever

Arxiv

0+阅读 · 2023年4月3日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

《生态建模密码破译：建模与编程实践》美陆军最新报告

《战争形态演变：合成兵种防御主导模式探析》48页slides

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

相关论文

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Arxiv

0+阅读 · 2023年4月10日

Training Language Models with Language Feedback at Scale

Arxiv

0+阅读 · 2023年4月9日

Probing Conceptual Understanding of Large Visual-Language Models

Arxiv

1+阅读 · 2023年4月7日

CAPOT: Creating Robust Dense Query Encoders using Post Training Contrastive Alignment

Arxiv

0+阅读 · 2023年4月6日

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Arxiv

0+阅读 · 2023年4月4日

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Arxiv

0+阅读 · 2023年4月4日

On the Feasibility of Specialized Ability Extracting for Large Language Code Models

Arxiv

0+阅读 · 2023年4月4日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

Questions Are All You Need to Train a Dense Passage Retriever

Arxiv

0+阅读 · 2023年4月3日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

相关基金

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

氮化硼纳米管增韧陶瓷刀具及其高效切削性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Windows 7 操作系统的安全性分析

国家自然科学基金

0+阅读 · 2011年12月31日

氧化应激诱导的G2/M期阻滞中HSP90对26S蛋白酶体的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Vitamin E脂质体纳米颗粒携带siRNA靶向抑制HCV的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

椭圆曲线密码学算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员