包含 " 做对错 " 和 " 做对错 " 的类似人类的 " 片断 " 的大型未受过训练语言模型 (Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do) - 专知论文

会员服务 ·

0

道德化 · 语言模型化 · 有偏 · MoDELS · PCA ·

2022 年 2 月 14 日

Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do

翻译：包含 " 做对错 " 和 " 做对错 " 的类似人类的 " 片断 " 的大型未受过训练语言模型

Patrick Schramowski,Cigdem Turan,Nico Andersen,Constantin A. Rothkopf,Kristian Kersting

Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, its variants, GPT-2/3, and others. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended state of the art for many NLP tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show that recent LMs also contain human-like biases of what is right and wrong to do, some form of ethical and moral norms of the society -- they bring a "moral direction" to surface. That is, we show that these norms can be captured geometrically by a direction, which can be computed, e.g., by a PCA, in the embedding space, reflecting well the agreement of phrases to social norms implicitly expressed in the training texts and providing a path for attenuating or even preventing toxic degeneration in LMs. Being able to rate the (non-)normativity of arbitrary phrases without explicitly training the LM for this task, we demonstrate the capabilities of the "moral direction" for guiding (even other) LMs towards producing normative text and showcase it on RealToxicityPrompts testbed, preventing the neural toxic degeneration in GPT-2.

翻译：人工写作使我们的生活充满了生命,因为大型变压器语言模型(LMS)等大型变压器语言模型(LMS)最近有所进步,如BERT、其变异物、GPT-2/3等。利用这些模型作为预先训练的模型,并微调用于具体任务,研究人员就许多NLP任务扩大了最先进的水平,并表明他们不仅掌握语言知识,而且还保留数据中隐含的一般知识。不幸的是,关于未过滤文本整体的LMS培训的LMS受到堕落和偏见行为的影响。虽然这一点已经确立,但我们显示最近LMS还包含一些对正确和错误的人类偏见,一些社会道德和道德规范形式,它们给社会带来了一种“道德方向”。也就是说,我们表明,这些规范可以通过一个方向从几何角度加以体现,例如,由常设仲裁院在嵌入空间进行计算,这反映了在培训文本中隐含的社会规范用语的一致,并且提供了一种在LMS(LMS)中减少或甚至防止有毒变形的道德变现的路径。能够明确地将G(不透明地显示其真实的文本的文本) 展示其真实方向。

0

相关内容

道德化

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

晚期汉-英二语者句法加工的调节机制：行为与ERP研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

具有可调节范数的支持向量机模型与算法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

基于多目视觉的形体语言感知与识别研究

国家自然科学基金

2+阅读 · 2011年12月31日

结构化过完备稀疏性约束的超分辨率图像重建研究

国家自然科学基金

0+阅读 · 2011年12月31日

You Are What You Write: Preserving Privacy in the Era of Large Language Models

You Are What You Write: Preserving Privacy in the Era of Large Language Models

Arxiv

0+阅读 · 2022年4月20日

Impossible Triangle: What's Next for Pre-trained Language Models?

Arxiv

0+阅读 · 2022年4月20日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models

Arxiv

0+阅读 · 2022年4月17日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Arxiv

0+阅读 · 2022年4月15日

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

Arxiv

0+阅读 · 2022年4月15日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

You Are What You Write: Preserving Privacy in the Era of Large Language Models

You Are What You Write: Preserving Privacy in the Era of Large Language Models

Arxiv

0+阅读 · 2022年4月20日

Impossible Triangle: What's Next for Pre-trained Language Models?

Arxiv

0+阅读 · 2022年4月20日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models

Arxiv

0+阅读 · 2022年4月17日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Arxiv

0+阅读 · 2022年4月15日

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

Arxiv

0+阅读 · 2022年4月15日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

相关基金

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

晚期汉-英二语者句法加工的调节机制：行为与ERP研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

具有可调节范数的支持向量机模型与算法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

基于多目视觉的形体语言感知与识别研究

国家自然科学基金

2+阅读 · 2011年12月31日

结构化过完备稀疏性约束的超分辨率图像重建研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员