越南社会媒体文本文本增加文本经验研究 (Empirical Study of Text Augmentation on Social Media Text in Vietnamese) - 专知论文

会员服务 ·

0

Performer · MoDELS · 数据集 · Networking · 标注 ·

2020 年 10 月 9 日

Empirical Study of Text Augmentation on Social Media Text in Vietnamese

翻译：越南社会媒体文本文本增加文本经验研究

Son T. Luu,Kiet Van Nguyen,Ngan Luu-Thuy Nguyen

from arxiv, Accepted by The 34th Pacific Asia Conference on Language, Information and Computation

In the text classification problem, the imbalance of labels in datasets affect the performance of the text-classification models. Practically, the data about user comments on social networking sites not altogether appeared - the administrators often only allow positive comments and hide negative comments. Thus, when collecting the data about user comments on the social network, the data is usually skewed about one label, which leads the dataset to become imbalanced and deteriorate the model's ability. The data augmentation techniques are applied to solve the imbalance problem between classes of the dataset, increasing the prediction model's accuracy. In this paper, we performed augmentation techniques on the VLSP2019 Hate Speech Detection on Vietnamese social texts and the UIT - VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis. The result of augmentation increases by about 1.5% in the F1-macro score on both corpora.

翻译：在文本分类问题中,数据集标签的不平衡影响文本分类模型的性能。实际上,关于社交网站用户评论的数据并非完全出现,管理者往往只允许正面评论和隐藏负面评论。因此,在收集社交网络用户评论的数据时,数据通常偏向于一个标签,导致数据集变得不平衡,并使模型的能力恶化。数据增强技术用于解决数据集各类别之间的不平衡问题,提高预测模型的准确性。在本文中,我们在越南社会文本上的VLSP2019仇恨言语探测和越南学生感知分析的UIT-VSFC:越南学生反馈公司。两个子体的F1-macro分数增加约1.5%的结果。

0

相关内容

Performer

【2020新书】社交媒体挖掘，212pdf，Mining Social Media

【2020新书】社交媒体挖掘，212pdf，Mining Social Media

专知会员服务

63+阅读 · 2020年7月30日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

37+阅读 · 2019年12月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

专知会员服务

16+阅读 · 2019年8月12日

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

专知会员服务

29+阅读 · 2019年6月16日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

专知

24+阅读 · 2018年3月31日

R文本分类之RTextTools

R文本分类之RTextTools

R语言中文社区

4+阅读 · 2018年1月17日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Text Recognition in the Wild: A Survey

Arxiv

3+阅读 · 2020年12月2日

Extreme Model Compression for On-device Natural Language Understanding

Arxiv

0+阅读 · 2020年11月30日

A Novel Sentiment Analysis Engine for Preliminary Depression Status Estimation on Social Media

Arxiv

0+阅读 · 2020年11月29日

Text Mining for Processing Interview Data in Computational Social Science

Arxiv

0+阅读 · 2020年11月28日

Positive-Unlabelled Survival Data Analysis

Arxiv

0+阅读 · 2020年11月26日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Chinese Word Segmentation: Another Decade Review (2007-2017)

Chinese Word Segmentation: Another Decade Review (2007-2017)

Arxiv

4+阅读 · 2019年1月18日

Data Augmentation of Room Classifiers using Generative Adversarial Networks

Data Augmentation of Room Classifiers using Generative Adversarial Networks

Arxiv

4+阅读 · 2019年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

【2020新书】社交媒体挖掘，212pdf，Mining Social Media

【2020新书】社交媒体挖掘，212pdf，Mining Social Media

专知会员服务

63+阅读 · 2020年7月30日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

37+阅读 · 2019年12月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

【IJCAI 2019 | tutorial】文本生成中的艺术字 Creative and Artistic Writing via Text Generation，北京大学|严睿

专知会员服务

16+阅读 · 2019年8月12日

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

【CVPR 2019 | tutorial】视觉识别Visual Recognition and Beyond，Facebook|Ross Girshick，Justin Johnson（李飞飞高徒）

专知会员服务

29+阅读 · 2019年6月16日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

专知

24+阅读 · 2018年3月31日

R文本分类之RTextTools

R文本分类之RTextTools

R语言中文社区

4+阅读 · 2018年1月17日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Text Recognition in the Wild: A Survey

Arxiv

3+阅读 · 2020年12月2日

Extreme Model Compression for On-device Natural Language Understanding

Arxiv

0+阅读 · 2020年11月30日

A Novel Sentiment Analysis Engine for Preliminary Depression Status Estimation on Social Media

Arxiv

0+阅读 · 2020年11月29日

Text Mining for Processing Interview Data in Computational Social Science

Arxiv

0+阅读 · 2020年11月28日

Positive-Unlabelled Survival Data Analysis

Arxiv

0+阅读 · 2020年11月26日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Arxiv

4+阅读 · 2019年4月9日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Chinese Word Segmentation: Another Decade Review (2007-2017)

Chinese Word Segmentation: Another Decade Review (2007-2017)

Arxiv

4+阅读 · 2019年1月18日

Data Augmentation of Room Classifiers using Generative Adversarial Networks

Data Augmentation of Room Classifiers using Generative Adversarial Networks

Arxiv

4+阅读 · 2019年1月10日

微信扫码咨询专知VIP会员