WebGPT: 浏览器协助问答,以人类反馈进行回答 (WebGPT: Browser-assisted question-answering with human feedback)

Reiichiro Nakano,Jacob Hilton,Suchir Balaji,Jeff Wu,Long Ouyang,Christina Kim,Christopher Hesse,Shantanu Jain,Vineet Kosaraju,William Saunders,Xu Jiang,Karl Cobbe,Tyna Eloundou,Gretchen Krueger,Kevin Button,Matthew Knight,Benjamin Chess,John Schulman

from arxiv, 32 pages

We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

翻译：我们微调GPT-3,用基于文本的网络浏览环境回答长式问题,这种环境允许模型搜索和浏览网络。通过设置任务,让人类能够完成。通过设置任务,我们有能力通过模仿学习来培训任务模型,然后通过人类反馈优化回答质量。为了使对事实准确性的人类评估更加容易,模型必须收集参考文献,同时浏览支持其回答。我们培训和评估了我们关于ELI5的模型,这是Reddit用户询问问题的数据集。我们的最佳模型是通过使用行为克隆对GPT-3进行微调获得的,然后对经过训练的预测人类偏好奖赏模型进行拒绝抽样。人类56%的时间和我们人类示威者69%的时间和最受欢迎的Reddit答案。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日