制作人造文本,替代或补充培训数据 (Generating artificial texts as substitution or complement of training data) - 专知论文

会员服务 ·

0

训练数据 · 置换 · MoDELS · 学成 · Performance ·

2021 年 10 月 25 日

Generating artificial texts as substitution or complement of training data

翻译：制作人造文本,替代或补充培训数据

Vincent Claveau,Antoine Chaffin,Ewa Kijak

from arxiv, 8 pages

The quality of artificially generated texts has considerably improved with the advent of transformers. The question of using these models to generate learning data for supervised learning tasks naturally arises. In this article, this question is explored under 3 aspects: (i) are artificial data an efficient complement? (ii) can they replace the original data when those are not available or cannot be distributed for confidentiality reasons? (iii) can they improve the explainability of classifiers? Different experiments are carried out on Web-related classification tasks -- namely sentiment analysis on product reviews and Fake News detection -- using artificially generated data by fine-tuned GPT-2 models. The results show that such artificial data can be used in a certain extend but require pre-processing to significantly improve performance. We show that bag-of-word approaches benefit the most from such data augmentation.

翻译：随着变压器的出现,人工生成的文本的质量已大为改善。使用这些模型生成学习数据用于监督的学习任务自然会产生问题。在本条中,这个问题在三个方面下探讨:(一)人工数据是有效的补充吗? (二)当由于保密原因无法获得或不能分发原始数据时,它们能否取代原始数据? (三)它们能否改善分类者的可解释性?在与网络有关的分类任务上进行了不同的实验 -- -- 即产品审查的情绪分析和假新闻探测 -- -- 使用经过微调的GPT-2模型人工生成的数据。结果显示,这种人工数据可以在某些方面使用,但需要预先处理才能大大改进性能。我们显示,一包字方法最能从这类数据扩充中受益。

0

相关内容

训练数据

【UAI2021教程】贝叶斯最优学习，65页ppt

【UAI2021教程】贝叶斯最优学习，65页ppt

专知会员服务

65+阅读 · 2021年8月7日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

知识图谱在可解释人工智能中的作用，附81页ppt

知识图谱在可解释人工智能中的作用，附81页ppt

专知会员服务

140+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Looking Beyond Corners: Contrastive Learning of Visual Representations for Keypoint Detection and Description Extraction

Arxiv

1+阅读 · 2021年12月22日

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

Arxiv

0+阅读 · 2021年12月17日

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

Arxiv

0+阅读 · 2021年12月17日

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Arxiv

31+阅读 · 2021年11月1日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

3+阅读 · 2020年12月22日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

VIP会员

文章信息

相关主题

相关VIP内容

【UAI2021教程】贝叶斯最优学习，65页ppt

【UAI2021教程】贝叶斯最优学习，65页ppt

专知会员服务

65+阅读 · 2021年8月7日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

知识图谱在可解释人工智能中的作用，附81页ppt

知识图谱在可解释人工智能中的作用，附81页ppt

专知会员服务

140+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Looking Beyond Corners: Contrastive Learning of Visual Representations for Keypoint Detection and Description Extraction

Arxiv

1+阅读 · 2021年12月22日

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

Arxiv

0+阅读 · 2021年12月17日

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

Arxiv

0+阅读 · 2021年12月17日

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Arxiv

31+阅读 · 2021年11月1日

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Arxiv

6+阅读 · 2021年5月26日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

3+阅读 · 2020年12月22日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

微信扫码咨询专知VIP会员