越南情绪分类是否必须使用字分割法? (Is word segmentation necessary for Vietnamese sentiment classification?) - 专知论文

会员服务 ·

0

情感分类 · MoDELS · 朴素贝叶斯 · 支持向量 · 语言模型化 ·

2023 年 1 月 1 日

Is word segmentation necessary for Vietnamese sentiment classification?

翻译：越南情绪分类是否必须使用字分割法?

Duc-Vu Nguyen,Ngan Luu-Thuy Nguyen

from arxiv, In Proceedings of the 16th International Conference on Computing and Communication Technologies (RIVF 2022)

To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.

翻译：就我们所知,本文首次试图回答越南情绪分类是否需要文字分割。为此,我们为越南人提出了五种预先训练的单一语言 S4 语言模型,包括一个无字分割模型,以及四个模型,使用RDRsegramenter、uitnlp、pyvi或预处理数据阶段的底模亚工具包。根据两个公司的综合实验结果,包括新闻和社会媒体的技术文章审查VLSP2016-SA汇编以及教育调查UIT-VSFC文集,我们有两个建议。首先,使用传统分类器,如Naive Bayes或支持Victor机器,越南情绪分类器可能不需要字分割,这些软件来自社会领域。第二,在使用BPE方法和输入深层学习模型之前使用字分割时,越南情绪分类需要文字分割。这样,RDRsegiment是立方、pyvi和底部的词分割工具。

0

相关内容

情感分类

情感分类是对带有感情色彩的主观性文本进行分析、推理的过程，即分析对说话人的态度，倾向正面，还是反面。它与传统的文本主题分类又不相同，传统主题分类是分析文本讨论的客观内容，而情感分类是要从文本中得到它是否支持某种观点的信息。

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

吸收倍增分离型波导Si/Ge雪崩探测器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SiPM的高性能In-Beam TOF-PET的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

短时空PS InSAR形变模型构建与稳健解算方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于小波和几何小波的脉冲星信号处理与天文图像去噪

国家自然科学基金

0+阅读 · 2011年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

连云港海岸带城市蔓延过程中景观安全阈值研究

国家自然科学基金

0+阅读 · 2009年12月31日

Energy-based survival modelling using harmoniums

Arxiv

0+阅读 · 2023年2月28日

Text classification dataset and analysis for Uzbek language

Arxiv

0+阅读 · 2023年2月28日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

Arxiv

17+阅读 · 2018年5月22日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

朴素贝叶斯

语言模型化

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新，DeepSeek-R1论文登上Nature封面，附83页补充材料

人工智能与未来战争

自动驾驶中的轨迹预测大型基础模型：全面综述

万字长文《对抗雷达系统的电子战综述》

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Energy-based survival modelling using harmoniums

Arxiv

0+阅读 · 2023年2月28日

Text classification dataset and analysis for Uzbek language

Arxiv

0+阅读 · 2023年2月28日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

Arxiv

17+阅读 · 2018年5月22日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

相关基金

吸收倍增分离型波导Si/Ge雪崩探测器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SiPM的高性能In-Beam TOF-PET的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

短时空PS InSAR形变模型构建与稳健解算方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于小波和几何小波的脉冲星信号处理与天文图像去噪

国家自然科学基金

0+阅读 · 2011年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

连云港海岸带城市蔓延过程中景观安全阈值研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员