促使GPT-3可靠 (Prompting GPT-3 To Be Reliable) - 专知论文

会员服务 ·

0

语言模型化 · Prompt · GPT-3 · MoDELS · 可约的 ·

2022 年 10 月 17 日

Prompting GPT-3 To Be Reliable

翻译：促使GPT-3可靠

Chenglei Si,Zhe Gan,Zhengyuan Yang,Shuohang Wang,Jianfeng Wang,Jordan Boyd-Graber,Lijuan Wang

from arxiv, Preprint; Feedback is welcome

Large language models (LLMs) show impressive abilities via few-shot prompting. Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world language applications. However, existing research focuses on models' accuracy on standard benchmarks and largely ignores their reliability, which is crucial for avoiding catastrophic real-world harms. While reliability is a broad and vaguely defined term, this work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We establish simple and effective prompts to demonstrate GPT-3's reliability in these four aspects: 1) generalize out-of-domain, 2) balance demographic distribution to reduce social biases, 3) calibrate language model probabilities, and 4) update the LLM's knowledge. We find that by employing appropriate prompts, GPT-3 outperforms smaller-scale supervised models by large margins on all these facets. We release all processed datasets, evaluation scripts, and model predictions to facilitate future analysis. Our findings not only shed new insights on the reliability of prompting LLMs, but more importantly, our prompting strategies can help practitioners more reliably use large language models like GPT-3.

翻译：大型语言模型(LLMS)通过微小的提示显示了令人印象深刻的能力。商业化的API,如OpenAI GPT-3,进一步增加了其在现实世界语言应用中的使用。但是,现有的研究侧重于模型标准基准的准确性,并在很大程度上忽视了可靠性,这对于避免灾难性现实世界的伤害至关重要。可靠性是一个广泛和模糊的术语,但这项工作将可靠性分解为四个方面:普遍性、公平性、校准性和事实质量。我们建立了简单有效的提示,以展示GPT-3在这四个方面的可靠性:1)一般化的外在,2)平衡人口分布以减少社会偏见,3)校准语言模型的概率,4)更新LLM的知识。我们发现,通过使用适当的提示,GPT-3超越了在所有这些方面的大边距上的小规模监督模型。我们发布了所有经过处理的数据集、评价脚本和模型预测,以便利今后的分析。我们的调查结果不仅对加速LMMS的可靠性提出了新的见解,而且更重要的是,我们的迅速战略可以帮助从业人员更可靠地使用GPT-3的大型语言。

0

相关内容

语言模型化

语言模型化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hedgehog/Gli1信号通路调控EMX2表达在肺鳞癌EMT转化和侵袭转移过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA-195调控转录因子Sox4介导的上皮间质转化在云南高发女性肺癌侵袭和转移中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

炎症微环境通过下调PP2Ac抑制胰腺癌细胞Par/aPKC极性复合体形成并诱导EMT的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Hedgehog-Gli1-DNMTs轴调控胰腺炎癌转化的表观遗传机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于GH/IGF-1轴糖尿病肾病大鼠Snail 1通路及TEMT的研究

国家自然科学基金

0+阅读 · 2012年12月31日

NOX4调控非小细胞肺癌EMT转化及侵袭转移的研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Variation-based Cause Effect Identification

Arxiv

0+阅读 · 2022年11月22日

Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

Arxiv

0+阅读 · 2022年11月20日

Overcoming Concept Shift in Domain-Aware Settings through Consolidated Internal Distributions

Arxiv

0+阅读 · 2022年11月20日

Improving Language Model Prompting in Support of Semi-autonomous Task Learning

Arxiv

0+阅读 · 2022年11月19日

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Arxiv

0+阅读 · 2022年11月18日

Planning with Large Language Models via Corrective Re-prompting

Arxiv

0+阅读 · 2022年11月17日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Arxiv

14+阅读 · 2021年9月17日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

21+阅读 · 2019年5月13日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Variation-based Cause Effect Identification

Arxiv

0+阅读 · 2022年11月22日

Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

Arxiv

0+阅读 · 2022年11月20日

Overcoming Concept Shift in Domain-Aware Settings through Consolidated Internal Distributions

Arxiv

0+阅读 · 2022年11月20日

Improving Language Model Prompting in Support of Semi-autonomous Task Learning

Arxiv

0+阅读 · 2022年11月19日

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Arxiv

0+阅读 · 2022年11月18日

Planning with Large Language Models via Corrective Re-prompting

Arxiv

0+阅读 · 2022年11月17日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Arxiv

14+阅读 · 2021年9月17日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

21+阅读 · 2019年5月13日

相关基金

Hedgehog/Gli1信号通路调控EMX2表达在肺鳞癌EMT转化和侵袭转移过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA-195调控转录因子Sox4介导的上皮间质转化在云南高发女性肺癌侵袭和转移中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

炎症微环境通过下调PP2Ac抑制胰腺癌细胞Par/aPKC极性复合体形成并诱导EMT的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Hedgehog-Gli1-DNMTs轴调控胰腺炎癌转化的表观遗传机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于GH/IGF-1轴糖尿病肾病大鼠Snail 1通路及TEMT的研究

国家自然科学基金

0+阅读 · 2012年12月31日

NOX4调控非小细胞肺癌EMT转化及侵袭转移的研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员