为真实世界问题解答生成、注解和使用合成数据管道 (A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering) - 专知论文

会员服务 ·

0

自动问答 · MoDELS · INFORMS · state-of-the-art · Performer ·

2022 年 11 月 30 日

A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering

翻译：为真实世界问题解答生成、注解和使用合成数据管道

Matthew Maufe,James Ravenscroft,Rob Procter,Maria Liakata

from arxiv, To be published in the companion proceedings of EMNLP 2022. 17 pages (11 of which are in the appendix), 7 figures (3 of which are in the appendix)

Question Answering (QA) is a growing area of research, often used to facilitate the extraction of information from within documents. State-of-the-art QA models are usually pre-trained on domain-general corpora like Wikipedia and thus tend to struggle on out-of-domain documents without fine-tuning. We demonstrate that synthetic domain-specific datasets can be generated easily using domain-general models, while still providing significant improvements to QA performance. We present two new tools for this task: A flexible pipeline for validating the synthetic QA data and training downstream models on it, and an online interface to facilitate human annotation of this generated data. Using this interface, crowdworkers labelled 1117 synthetic QA pairs, which we then used to fine-tune downstream models and improve domain-specific QA performance by 8.75 F1.

翻译：问题解答(QA)是一个日益扩大的研究领域,通常用于便利从文件内提取信息。最先进的质量解答模式通常在Wikipedia这样的一般领域公司接受预先培训,因此往往在不作微调的情况下在外部文件上挣扎。我们证明,合成特定领域数据集可以很容易地使用一般领域模型生成,同时仍然为质量解答绩效提供重大改进。我们为此任务提出了两个新工具:验证合成质量解析数据和培训下游模型的灵活管道,以及便利人类对产生的数据进行批注的在线界面。我们利用这一界面,众工标有1117个合成质量解析配对,然后我们用这些对来微调下游模型,并在8.75 F1之前改进特定领域的质量解析性能。

0

相关内容

自动问答

自动问答（Question Answering, QA）是指利用计算机自动回答用户所提出的问题以满足用户知识需求的任务。不同于现有搜索引擎，问答系统是信息服务的一种高级形式，系统返回用户的不再是基于关键词匹配排序的文档列表，而是精准的自然语言答案。近年来，随着人工智能的飞速发展，自动问答已经成为倍受关注且发展前景广泛的研究方向。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

基于全基因组的ncRNA 序列分析发掘结核分枝杆菌和巨噬细胞的互作机制

国家自然科学基金

0+阅读 · 2014年12月31日

低维纳米结构形貌演变对复合界面材料热量输运特性的影响

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

内皮细胞TRPV4-SKCa3耦联稳态失调在高血压血管功能稳态失调中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

多时域材料分割算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

利用疏水相互作用构建磁/光纳米晶二元复合超晶格胶体微球

国家自然科学基金

0+阅读 · 2009年12月31日

空心的电磁功能化导电聚合物微/纳米结构的构筑

国家自然科学基金

0+阅读 · 2009年12月31日

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Arxiv

0+阅读 · 2023年2月1日

Chatbots for Robotic Process Automation: Investigating Perceived Trust and User Satisfaction

Arxiv

0+阅读 · 2023年2月1日

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

Arxiv

0+阅读 · 2023年2月1日

A Cohesive Distillation Architecture for Neural Language Models

Arxiv

0+阅读 · 2023年1月29日

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

Arxiv

0+阅读 · 2023年1月28日

Graph Attention with Hierarchies for Multi-hop Question Answering

Arxiv

0+阅读 · 2023年1月27日

Generating Synthetic Data for Conversational Music Recommendation Using Random Walks and Language Models

Arxiv

0+阅读 · 2023年1月27日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

Arxiv

0+阅读 · 2023年2月1日

Chatbots for Robotic Process Automation: Investigating Perceived Trust and User Satisfaction

Arxiv

0+阅读 · 2023年2月1日

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

Arxiv

0+阅读 · 2023年2月1日

A Cohesive Distillation Architecture for Neural Language Models

Arxiv

0+阅读 · 2023年1月29日

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

Arxiv

0+阅读 · 2023年1月28日

Graph Attention with Hierarchies for Multi-hop Question Answering

Arxiv

0+阅读 · 2023年1月27日

Generating Synthetic Data for Conversational Music Recommendation Using Random Walks and Language Models

Arxiv

0+阅读 · 2023年1月27日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

基于全基因组的ncRNA 序列分析发掘结核分枝杆菌和巨噬细胞的互作机制

国家自然科学基金

0+阅读 · 2014年12月31日

低维纳米结构形貌演变对复合界面材料热量输运特性的影响

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

内皮细胞TRPV4-SKCa3耦联稳态失调在高血压血管功能稳态失调中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥DIF（DRIP1-Interacting Factor）在胁迫信号应答中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

多时域材料分割算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

利用疏水相互作用构建磁/光纳米晶二元复合超晶格胶体微球

国家自然科学基金

0+阅读 · 2009年12月31日

空心的电磁功能化导电聚合物微/纳米结构的构筑

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员