通过递归和迭代删除进行情感分析的定量短写生成 (Quantitative Stopword Generation for Sentiment Analysis via Recursive and Iterative Deletion) - 专知论文

会员服务 ·

0

Analysis · 可约的 · 情感分析 · Performance · 模型性能 ·

2022 年 9 月 4 日

Quantitative Stopword Generation for Sentiment Analysis via Recursive and Iterative Deletion

翻译：通过递归和迭代删除进行情感分析的定量短写生成

Daniel M. DiPietro

Stopwords carry little semantic information and are often removed from text data to reduce dataset size and improve machine learning model performance. Consequently, researchers have sought to develop techniques for generating effective stopword sets. Previous approaches have ranged from qualitative techniques relying upon linguistic experts, to statistical approaches that extract word importance using correlations or frequency-dependent metrics computed on a corpus. We present a novel quantitative approach that employs iterative and recursive feature deletion algorithms to see which words can be deleted from a pre-trained transformer's vocabulary with the least degradation to its performance, specifically for the task of sentiment analysis. Empirically, stopword lists generated via this approach drastically reduce dataset size while negligibly impacting model performance, in one such example shrinking the corpus by 28.4% while improving the accuracy of a trained logistic regression model by 0.25%. In another instance, the corpus was shrunk by 63.7% with a 2.8% decrease in accuracy. These promising results indicate that our approach can generate highly effective stopword sets for specific NLP tasks.

翻译：标准词含有很少的语义信息,通常从文本数据中删除,以降低数据集大小,改进机器学习模型的性能。因此,研究人员努力开发产生有效制模的技术。以前的方法包括依赖语言专家的定性技术,到利用相关关系或按频率计算的数据来提取单词重要性的统计方法。我们提出了一个新的定量方法,采用迭代和循环特性删除算法,以观察哪些词可以从经过预先训练的变压器词汇中删除,其变形最小,到其性能,特别是情绪分析任务。随机性地,通过这种方法产生的断字列表会大幅降低数据集大小,同时对模型性能产生明显的影响,在其中一个例子中将堆积缩小了28.4%,同时将经过训练的物流回归模型的精确度提高了0.25%。在另一个实例中,该结构减少了63.7%的精度,减少了2.8 %。这些有希望的结果表明,我们的方法可以为具体的NLP任务生成非常有效的断字套。

0

相关内容

Analysis

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Tob基因在大肠癌发生发展中的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

玉米粒型性状形成关键基因的克隆及遗传解析

国家自然科学基金

0+阅读 · 2013年12月31日

水稻株型发育遗传调控网络的解析

国家自然科学基金

0+阅读 · 2013年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

G蛋白偶联受体信号分子调控成年神经发生的研究

国家自然科学基金

0+阅读 · 2012年12月31日

HRR通路miRNA及其靶序列SNP与乳腺癌遗传易感性关联分析及功能论证

国家自然科学基金

0+阅读 · 2012年12月31日

PLCE1基因及其介导的信号通路在新疆哈萨克族食管癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于卵巢组织差异表达基因的多浪羊常年发情的分子遗传基础

国家自然科学基金

0+阅读 · 2009年12月31日

当归补血汤对动脉粥样硬化内皮祖细胞的调控及作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

DNA损伤应激反应中变异剪接基因的鉴定及其功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

Arxiv

0+阅读 · 2022年10月20日

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

Arxiv

1+阅读 · 2022年10月20日

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

Arxiv

0+阅读 · 2022年10月20日

Pre-trained Sentence Embeddings for Implicit Discourse Relation Classification

Arxiv

0+阅读 · 2022年10月20日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

Arxiv

0+阅读 · 2022年10月20日

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

Arxiv

1+阅读 · 2022年10月20日

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

Arxiv

0+阅读 · 2022年10月20日

Pre-trained Sentence Embeddings for Implicit Discourse Relation Classification

Arxiv

0+阅读 · 2022年10月20日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

相关基金

Tob基因在大肠癌发生发展中的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

玉米粒型性状形成关键基因的克隆及遗传解析

国家自然科学基金

0+阅读 · 2013年12月31日

水稻株型发育遗传调控网络的解析

国家自然科学基金

0+阅读 · 2013年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

G蛋白偶联受体信号分子调控成年神经发生的研究

国家自然科学基金

0+阅读 · 2012年12月31日

HRR通路miRNA及其靶序列SNP与乳腺癌遗传易感性关联分析及功能论证

国家自然科学基金

0+阅读 · 2012年12月31日

PLCE1基因及其介导的信号通路在新疆哈萨克族食管癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于卵巢组织差异表达基因的多浪羊常年发情的分子遗传基础

国家自然科学基金

0+阅读 · 2009年12月31日

当归补血汤对动脉粥样硬化内皮祖细胞的调控及作用机制

国家自然科学基金

0+阅读 · 2009年12月31日

DNA损伤应激反应中变异剪接基因的鉴定及其功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员