协助人力评价员的自我确认模式 (Self-critiquing models for assisting human evaluators)

We fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning. On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed. Our models help find naturally occurring flaws in both model and human written summaries, and intentional flaws in summaries written by humans to be deliberately misleading. We study scaling properties of critiquing with both topic-based summarization and synthetic tasks. Larger models write more helpful critiques, and on most tasks, are better at self-critiquing, despite having harder-to-critique outputs. Larger models can also integrate their own self-critiques as feedback, refining their own summaries into better ones. Finally, we motivate and introduce a framework for comparing critiquing ability to generation and discrimination ability. Our measurements suggest that even large models may still have relevant knowledge they cannot or do not articulate as critiques. These results are a proof of concept for using AI-assisted human feedback to scale the supervision of machine learning systems to tasks that are difficult for humans to evaluate directly. We release our training datasets, as well as samples from our critique assistance experiments.

翻译：我们用行为性克隆来微调使用自然语言批评(自然语言批评评论)的大型语言模型; 在基于主题的总结任务中,我们模型的批评有助于人类发现他们本会错失的摘要中的缺陷; 我们的模型有助于发现模型和人文书面摘要中自然出现的缺陷,以及人文摘要中的故意缺陷,蓄意误导; 我们用基于主题的总结和合成任务来研究滑动的特性; 大模型写得更有帮助的批评和大多数任务,在自我消化方面做得更好,尽管其产出更难批评。大模型还可以将自己的自我批评作为反馈,将其自己摘要改进为更好的摘要。最后,我们鼓励和引入一个框架,比较创造和歧视能力的关键能力。我们的测量结果表明,即使大型模型可能仍然具有相关的知识,它们不能或不会作为批评性说明。这些结果证明了使用人工辅助的人类反馈的概念,以扩大对机器学习系统的监督,作为反馈,作为反馈的反馈,作为反馈,作为人类直接评估的样本,我们难以评估的任务。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日