用于微光对话框生成的自增强数据选择 (Self-augmented Data Selection for Few-shot Dialogue Generation) - 专知论文

会员服务 ·

0

数据选择 · 任务对话系统 · 小样本学习 · MoDELS · 伪标记 ·

2022 年 5 月 19 日

Self-augmented Data Selection for Few-shot Dialogue Generation

翻译：用于微光对话框生成的自增强数据选择

Wanyu Du,Hanjie Chen,Yangfeng Ji

The natural language generation (NLG) module in task-oriented dialogue systems translates structured meaning representations (MRs) into text responses, which has a great impact on users' experience as the human-machine interaction interface. However, in practice, developers often only have a few well-annotated data and confront a high data collection cost to build the NLG module. In this work, we adopt the self-training framework to deal with the few-shot MR-to-Text generation problem. We leverage the pre-trained language model to self-augment many pseudo-labeled data. To prevent the gradual drift from target data distribution to noisy augmented data distribution, we propose a novel data selection strategy to select the data that our generation model is most uncertain about. Compared with existing data selection methods, our method is: (1) parameter-efficient, which does not require training any additional neural models, (2) computation-efficient, which only needs to apply several stochastic forward passes of the model to estimate the uncertainty. We conduct empirical experiments on two benchmark datasets: FewShotWOZ and FewShotSGD, and show that our proposed framework consistently outperforms other baselines in terms of BLEU and ERR.

翻译：以任务为导向的对话系统中的自然语言生成模块(NLG)在任务为导向的对话系统中的自然语言生成模块将结构含义表示(MRs)转化为文本响应,这对用户作为人机互动界面的经验有很大影响,但在实践中,开发者往往只拥有少量附加说明的数据,而且要面对高额数据收集成本来建立NLG模块。在这项工作中,我们采用自培训框架来应对微弱的MR-Text生成问题。我们利用预先培训的语言模型来自我增强许多假标签数据。为了防止目标数据分配逐渐从目标数据流流向噪音增强的数据分配,我们提出了一个新的数据选择战略,以选择我们生成模型最不确定的数据。与现有的数据选择方法相比,我们的方法是:(1) 参数效率,它不需要培训任何额外的神经模型,(2) 计算效率,它只需要使用模型的若干随机前传来估计不确定性。我们在两个基准数据集上进行了实验:很少ShotWOZ和WIPShoSGD, 并显示我们提议的框架始终超越EUR的其他基准。

0

相关内容

数据选择

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LMP1干扰通过PI3K/Akt/mTOR通路逆转鼻咽癌细胞的TRAIL抵抗

国家自然科学基金

0+阅读 · 2013年12月31日

激波与边界层干扰下高超声速气动热的高精度数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

滑坡碎屑流持速效应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属纳米颗粒与半导体量子点耦合效应机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

A Survey on Participant Selection for Federated Learning in Mobile Networks

Arxiv

0+阅读 · 2022年7月8日

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Arxiv

0+阅读 · 2022年7月8日

PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning

Arxiv

0+阅读 · 2022年7月7日

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Arxiv

0+阅读 · 2022年7月7日

Selectively increasing the diversity of GAN-generated samples

Arxiv

0+阅读 · 2022年7月7日

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Arxiv

0+阅读 · 2022年7月7日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

VIP会员

文章信息

相关主题

任务对话系统

小样本学习

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

A Survey on Participant Selection for Federated Learning in Mobile Networks

Arxiv

0+阅读 · 2022年7月8日

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Arxiv

0+阅读 · 2022年7月8日

PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning

Arxiv

0+阅读 · 2022年7月7日

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Arxiv

0+阅读 · 2022年7月7日

Selectively increasing the diversity of GAN-generated samples

Arxiv

0+阅读 · 2022年7月7日

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Arxiv

0+阅读 · 2022年7月7日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

相关基金

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LMP1干扰通过PI3K/Akt/mTOR通路逆转鼻咽癌细胞的TRAIL抵抗

国家自然科学基金

0+阅读 · 2013年12月31日

激波与边界层干扰下高超声速气动热的高精度数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

滑坡碎屑流持速效应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属纳米颗粒与半导体量子点耦合效应机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员