改进大规模购置和生成参数 (Improving Large-scale Paraphrase Acquisition and Generation) - 专知论文

会员服务 ·

0

情景 · state-of-the-art · MoDELS · 语言模型化 · Quora ·

2022 年 10 月 6 日

Improving Large-scale Paraphrase Acquisition and Generation

翻译：改进大规模购置和生成参数

Yao Dou,Chao Jiang,Wei Xu

from arxiv, The project webpage is at http://twitter-paraphrase.com/

This paper addresses the quality issues in existing Twitter-based paraphrase datasets, and discusses the necessity of using two separate definitions of paraphrase for identification and generation tasks. We present a new Multi-Topic Paraphrase in Twitter (MultiPIT) corpus that consists of a total of 130k sentence pairs with crowdsoursing (MultiPIT_crowd) and expert (MultiPIT_expert) annotations using two different paraphrase definitions for paraphrase identification, in addition to a multi-reference test set (MultiPIT_NMR) and a large automatically constructed training set (MultiPIT_Auto) for paraphrase generation. With improved data annotation quality and task-specific paraphrase definition, the best pre-trained language model fine-tuned on our dataset achieves the state-of-the-art performance of 84.2 F1 for automatic paraphrase identification. Furthermore, our empirical results also demonstrate that the paraphrase generation models trained on MultiPIT_Auto generate more diverse and high-quality paraphrases compared to their counterparts fine-tuned on other corpora such as Quora, MSCOCO, and ParaNMT.

翻译：本文讨论现有基于Twitter的参数数据集的质量问题,并讨论使用两种不同的参数定义来进行识别和生成任务的必要性。我们在Twitter(MultiPIT)文集中提出了一个新的多语句新词句,该词句由总共130k对的句子和专家(MultiPIT_crowd)和专家(MultiPIT_Expert)说明组成,使用两种不同的参数定义来进行参数识别,此外还有多种参考测试集(MultiPIT_NMR)和大型自动构建的培训集(MultiPIT_Auto),用于生成参数。随着数据说明质量的改进和特定任务参数定义的改进,对我们的数据集进行最佳预先培训的语言模型实现了84.2 F1的状态性能,用于自动参数识别。此外,我们的经验结果还表明,在多语系PIT_Autouto 和NCOPara等其他对等对口单位进行微调的参数生成模型更加多样化和高质量。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

调节抗低温糖化马铃薯转化酶抑制子StInvInh2关键转录因子的克隆及其功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

废水处理过程中溶解性微生物产物的形成及转化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微生物药物生物合成转化知识库的构建

国家自然科学基金

2+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

UV-C、JA和H2O2调控葡萄白藜芦醇合成信号转导途径的交叉对话分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

多功能小干扰RNA：同时沉默VEGF、激活RIG-I信号通路和caspase级联反应治疗非小细胞肺癌

国家自然科学基金

0+阅读 · 2011年12月31日

铝（III）分子簇的水解聚合与形态转化机理及其生物毒性效应

国家自然科学基金

0+阅读 · 2011年12月31日

Anginex重组腺相关病毒抗血管生成信号转导通路的研究

国家自然科学基金

0+阅读 · 2011年12月31日

CH4-CO2等温两步梯阶反应合成乙酸的热力学、动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

新疆维吾尔族恶性淋巴瘤TNF基因表达及多态性的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Few-shot Image Generation with Diffusion Models

Arxiv

0+阅读 · 2022年11月11日

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Arxiv

0+阅读 · 2022年11月11日

Training and Serving Machine Learning Models at Scale

Arxiv

0+阅读 · 2022年11月10日

Speech separation with large-scale self-supervised learning

Arxiv

0+阅读 · 2022年11月9日

Searching for a higher power in the human evaluation of MT

Arxiv

0+阅读 · 2022年11月9日

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Arxiv

0+阅读 · 2022年11月9日

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年11月9日

Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing

Arxiv

0+阅读 · 2022年11月8日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

VIP会员

文章信息

相关主题

state-of-the-art

语言模型化

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身操作的视觉-语言-动作模型综述

《多域空战指挥体系：驾驭复杂性的艺术》

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Few-shot Image Generation with Diffusion Models

Arxiv

0+阅读 · 2022年11月11日

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Arxiv

0+阅读 · 2022年11月11日

Training and Serving Machine Learning Models at Scale

Arxiv

0+阅读 · 2022年11月10日

Speech separation with large-scale self-supervised learning

Arxiv

0+阅读 · 2022年11月9日

Searching for a higher power in the human evaluation of MT

Arxiv

0+阅读 · 2022年11月9日

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

Arxiv

0+阅读 · 2022年11月9日

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年11月9日

Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing

Arxiv

0+阅读 · 2022年11月8日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

相关基金

调节抗低温糖化马铃薯转化酶抑制子StInvInh2关键转录因子的克隆及其功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

废水处理过程中溶解性微生物产物的形成及转化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微生物药物生物合成转化知识库的构建

国家自然科学基金

2+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

UV-C、JA和H2O2调控葡萄白藜芦醇合成信号转导途径的交叉对话分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

多功能小干扰RNA：同时沉默VEGF、激活RIG-I信号通路和caspase级联反应治疗非小细胞肺癌

国家自然科学基金

0+阅读 · 2011年12月31日

铝（III）分子簇的水解聚合与形态转化机理及其生物毒性效应

国家自然科学基金

0+阅读 · 2011年12月31日

Anginex重组腺相关病毒抗血管生成信号转导通路的研究

国家自然科学基金

0+阅读 · 2011年12月31日

CH4-CO2等温两步梯阶反应合成乙酸的热力学、动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

新疆维吾尔族恶性淋巴瘤TNF基因表达及多态性的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员