大规模语言模型的完整性增强：通过对话启用的解决代理 (DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents) - 专知论文

会员服务 ·

0

完整性 · GPT-4 · 输出 · 语言模型 · 反馈信息 ·

2023 年 3 月 30 日

DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents

翻译：大规模语言模型的完整性增强：通过对话启用的解决代理

Varun Nair,Elliot Schumacher,Geoffrey Tso,Anitha Kannan

Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs, namely GPT-4. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types - a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher's information and makes judgments on the final output. We test DERA against three clinically-focused tasks. For medical conversation summarization and care plan generation, DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics. In a new finding, we also show that GPT-4's performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (Jin et al. 2021, USMLE) is well above the passing level (60%), with DERA showing similar performance. We release the open-ended MEDQA dataset at https://github.com/curai/curai-research/tree/main/DERA.

翻译：大型语言模型(LLM)已经成为许多自然语言理解任务的有价值工具。在医疗等安全关键应用中，这些模型的效用受到其生成的输出的事实准确性和完整性的影响。本文介绍了一种通过提高LLM的对话能力（即GPT-4）实现的范式——对话启用的解决代理(DERA)。它为模型提供了一个简单、可解释的论坛，用于沟通反馈信息并迭代改进输出。我们将我们的对话框架定为两种代理类型之间的讨论——研究者处理信息并确定关键问题部分，决策者则有权利将研究者的信息整合并对最终输出做出判断。我们针对三个以临床为中心的任务对DERA进行了测试。对于医疗对话摘要和治疗计划生成，在人类专家偏好评估和定量度量方面DERA在GPT-4基础性能上表现出显著提高。在一个新的发现中，我们还展示了GPT-4对于MEDQA问答数据集的开放版本(70%)的性能(Jin等人，2021年，USMLE)，远高于及格水平(60%)，而DERA表现出类似的性能。我们在https://github.com/curai/curai-research/tree/main/DERA发布了开放式MEDQA数据集。

0

相关内容

完整性

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

THU数据派

11+阅读 · 2019年3月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

铁在催化活性位中的作用 - FeN/C与CNx的活性位结构对氧还原活性和稳定性的影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

Tfh细胞-IL21-B细胞轴在慢性乙型肝炎HBeAg血清学转换中的作用及补肾方的干预机制

国家自然科学基金

0+阅读 · 2014年12月31日

用于同步辐射测量的阻性MicroMEGAS研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于开径式非相干宽带腔增强吸收光谱大气中HONO探测方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

过渡金属催化的含硫基团导向的C-H键活化反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

网络链路选择问题的近似算法

国家自然科学基金

0+阅读 · 2009年12月31日

DegP分子伴侣和蛋白酶双功能的单分子探测

国家自然科学基金

0+阅读 · 2009年12月31日

无线Mesh网络的部署及优化关键问题研究

国家自然科学基金

1+阅读 · 2009年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Post Hoc Explanations of Language Models Can Improve Language Models

Arxiv

0+阅读 · 2023年5月19日

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Arxiv

0+阅读 · 2023年5月18日

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Arxiv

0+阅读 · 2023年5月17日

Leveraging Large Language Models in Conversational Recommender Systems

Arxiv

0+阅读 · 2023年5月16日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

473+阅读 · 2023年3月31日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

61+阅读 · 2023年3月29日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

相关VIP内容

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

最新NLP论文阅读列表，包括对话、问答、摘要、翻译等（附资源）

THU数据派

11+阅读 · 2019年3月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Understanding HTML with Large Language Models

Understanding HTML with Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Post Hoc Explanations of Language Models Can Improve Language Models

Arxiv

0+阅读 · 2023年5月19日

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Arxiv

0+阅读 · 2023年5月18日

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Arxiv

0+阅读 · 2023年5月17日

Leveraging Large Language Models in Conversational Recommender Systems

Arxiv

0+阅读 · 2023年5月16日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

473+阅读 · 2023年3月31日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

61+阅读 · 2023年3月29日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

铁在催化活性位中的作用 - FeN/C与CNx的活性位结构对氧还原活性和稳定性的影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

Tfh细胞-IL21-B细胞轴在慢性乙型肝炎HBeAg血清学转换中的作用及补肾方的干预机制

国家自然科学基金

0+阅读 · 2014年12月31日

用于同步辐射测量的阻性MicroMEGAS研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于开径式非相干宽带腔增强吸收光谱大气中HONO探测方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

过渡金属催化的含硫基团导向的C-H键活化反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

网络链路选择问题的近似算法

国家自然科学基金

0+阅读 · 2009年12月31日

DegP分子伴侣和蛋白酶双功能的单分子探测

国家自然科学基金

0+阅读 · 2009年12月31日

无线Mesh网络的部署及优化关键问题研究

国家自然科学基金

1+阅读 · 2009年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员