Pan More Gold from the Sand: 改进与噪音自回收一代的公开对话培训 (Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation) - 专知论文

会员服务 ·

0

任务对话系统 · Performer · 知识 (knowledge) · Better · 训练数据 ·

2022 年 9 月 15 日

Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation

翻译：Pan More Gold from the Sand: 改进与噪音自回收一代的公开对话培训

Yihe Wang,Yitong Li,Yasheng Wang,Fei Mi,Pingyi Zhou,Xin Wang,Jin Liu,Xin Jiang,Qun Liu

from arxiv, Accepted in COLING 2022

Real human conversation data are complicated, heterogeneous, and noisy, from which building open-domain dialogue systems remains a challenging task. In fact, such dialogue data still contains a wealth of information and knowledge, however, they are not fully explored. In this paper, we show existing open-domain dialogue generation methods that memorize context-response paired data with autoregressive or encode-decode language models underutilize the training data. Different from current approaches, using external knowledge, we explore a retrieval-generation training framework that can take advantage of the heterogeneous and noisy training data by considering them as "evidence". In particular, we use BERTScore for retrieval, which gives better qualities of the evidence and generation. Experiments over publicly available datasets demonstrate that our method can help models generate better responses, even such training data are usually impressed as low-quality data. Such performance gain is comparable with those improved by enlarging the training set, even better. We also found that the model performance has a positive correlation with the relevance of the retrieved evidence. Moreover, our method performed well on zero-shot experiments, which indicates that our method can be more robust to real-world data.

翻译：真正的人类对话数据是复杂、多样和吵闹的,从中建立开放域对话系统仍然是一项艰巨的任务。事实上,这种对话数据仍然包含大量信息和知识,然而,它们并没有得到充分的探索。在本文中,我们展示了现有的开放域对话生成方法,这些方法将背景反应数据与自动递减或编码解码语言模型混为一模一样,没有充分利用培训数据。与目前的方法不同,我们利用外部知识探索了检索生成培训框架,这种框架可以通过将这些数据视为“证据”来利用多样性和吵闹的培训数据。特别是,我们使用BERTScore来检索这些数据,这提供了更好的证据和生成质量。对公开数据集的实验表明,我们的方法可以帮助模型产生更好的反应,即使这类培训数据通常也以低质量数据为印象。这种业绩收益与通过扩大培训数据集而改进的数据相比,甚至更好。我们还发现,模型性能与检索的证据的相关性有着积极的相关性。此外,我们用零镜头进行的实验表明,我们的方法可以更可靠到真实的数据。

0

相关内容

任务对话系统

任务对话系统

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

甲烷在Ni基和Cu基合金表面分解的微观机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

刺参对蛋白质营养干预的应答机制

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白乙酰转移酶OsglHAT1调控水稻粒型和粒重的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

iTRAQ技术分析钝叶草抗旱的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于融合智能算法斜拉桥振动控制Benchmark问题的混合控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

离区蛋白质组差异表达与南方葡萄采后落粒的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

uPA和uPAR参与牛卵丘细胞和体外成熟卵母细胞对话的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

室温多铁性材料的合成、物性及结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

非热等离子体协同低温定量催化氧化NO研究

国家自然科学基金

0+阅读 · 2009年12月31日

调控拟南芥camalexin合成途径上多基因协同转录的转录因子基因克隆及MAPK磷酸化对其功能调控的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Arxiv

0+阅读 · 2022年10月25日

Conditional set generation using Seq2seq models

Arxiv

0+阅读 · 2022年10月24日

Improving Passage Retrieval with Zero-Shot Question Generation

Arxiv

0+阅读 · 2022年10月22日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

Arxiv

23+阅读 · 2019年11月5日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

VIP会员

文章信息

相关主题

任务对话系统

知识 (knowledge)

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Arxiv

0+阅读 · 2022年10月25日

Conditional set generation using Seq2seq models

Arxiv

0+阅读 · 2022年10月24日

Improving Passage Retrieval with Zero-Shot Question Generation

Arxiv

0+阅读 · 2022年10月22日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

Arxiv

23+阅读 · 2019年11月5日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

相关基金

甲烷在Ni基和Cu基合金表面分解的微观机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

刺参对蛋白质营养干预的应答机制

国家自然科学基金

0+阅读 · 2015年12月31日

组蛋白乙酰转移酶OsglHAT1调控水稻粒型和粒重的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

iTRAQ技术分析钝叶草抗旱的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于融合智能算法斜拉桥振动控制Benchmark问题的混合控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

离区蛋白质组差异表达与南方葡萄采后落粒的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

uPA和uPAR参与牛卵丘细胞和体外成熟卵母细胞对话的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

室温多铁性材料的合成、物性及结构研究

国家自然科学基金

0+阅读 · 2011年12月31日

非热等离子体协同低温定量催化氧化NO研究

国家自然科学基金

0+阅读 · 2009年12月31日

调控拟南芥camalexin合成途径上多基因协同转录的转录因子基因克隆及MAPK磷酸化对其功能调控的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员