搜索和学习:改进数据到文字生成的语义覆盖 (Search and Learn: Improving Semantic Coverage for Data-to-Text Generation) - 专知论文

会员服务 ·

0

E2E · 语言模型化 · 小样本学习 · MoDELS · 学成 ·

2021 年 12 月 6 日

Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

翻译：搜索和学习:改进数据到文字生成的语义覆盖

Shailza Jolly,Zi Xuan Zhang,Andreas Dengel,Lili Mou

from arxiv, Accepted by AAAI'22

Data-to-text generation systems aim to generate text descriptions based on input data (often represented in the tabular form). A typical system uses huge training samples for learning the correspondence between tables and texts. However, large training sets are expensive to obtain, limiting the applicability of these approaches in real-world scenarios. In this work, we focus on few-shot data-to-text generation. We observe that, while fine-tuned pretrained language models may generate plausible sentences, they suffer from the low semantic coverage problem in the few-shot setting. In other words, important input slots tend to be missing in the generated text. To this end, we propose a search-and-learning approach that leverages pretrained language models but inserts the missing slots to improve the semantic coverage. We further fine-tune our system based on the search results to smooth out the search noise, yielding better-quality text and improving inference efficiency to a large extent. Experiments show that our model achieves high performance on E2E and WikiBio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.

翻译：数据到文本生成系统旨在根据输入数据生成文本描述文字(通常以表格形式表示)。典型的系统使用巨大的培训样本来学习表格和文本之间的对应关系。然而,大型培训成套系统费用昂贵,限制了这些方法在现实世界情景中的适用性。在这项工作中,我们侧重于微小数据到文本生成。我们发现,尽管经过微调的预先培训的语言模型可能会产生合理的句子,但它们在微小片片片段的语义覆盖上会遇到低层次的语义覆盖问题。换句话说,在生成的文本中,重要输入槽往往缺少。为此,我们建议采用搜索和学习方法,利用预先培训的语言模型,但插入缺失的空格来改进语义覆盖。我们进一步根据搜索结果微调我们的系统,以平滑搜索噪音,产生更高质量的文本,并在很大程度上提高推断效率。实验显示,我们的模型在E2E和WikiBio数据集上取得了很高的性能。特别是,我们覆盖了E2E的98.35%的输入槽,这在很大程度上减轻了低度问题。

0

相关内容

E2E

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【Yann Lecun最新报告】基于能量的自监督学习（Energy-Based Self-Supervised Learning ）附68页ppt

【Yann Lecun最新报告】基于能量的自监督学习（Energy-Based Self-Supervised Learning ）附68页ppt

专知会员服务

87+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Survey of Hallucination in Natural Language Generation

Arxiv

0+阅读 · 2022年2月8日

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Arxiv

11+阅读 · 2020年7月8日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

A Unified approach for Conventional Zero-shot, Generalized Zero-shot and Few-shot Learning

Arxiv

4+阅读 · 2017年10月26日

VIP会员

文章信息

相关主题

语言模型化

小样本学习

相关VIP内容

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【Yann Lecun最新报告】基于能量的自监督学习（Energy-Based Self-Supervised Learning ）附68页ppt

【Yann Lecun最新报告】基于能量的自监督学习（Energy-Based Self-Supervised Learning ）附68页ppt

专知会员服务

87+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Survey of Hallucination in Natural Language Generation

Arxiv

0+阅读 · 2022年2月8日

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Arxiv

11+阅读 · 2020年7月8日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

A Unified approach for Conventional Zero-shot, Generalized Zero-shot and Few-shot Learning

Arxiv

4+阅读 · 2017年10月26日

微信扫码咨询专知VIP会员