评价不受监督的文本分类:零射和以相似性为基础的方法 (Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches) - 专知论文

会员服务 ·

0

文本分类 · state-of-the-art · 无监督 · SimCSE · 类别 ·

2022 年 11 月 29 日

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

翻译：评价不受监督的文本分类:零射和以相似性为基础的方法

Tim Schopf,Daniel Braun,Florian Matthes

from arxiv, Accepted to 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR '22)

Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.

翻译：以类似方式尝试根据文本文件说明和类别说明的相似性对实例进行分类; 零点文本分类办法的目的是通过为文本文件指定适当的未知类别标签来普及从培训任务中获得的知识; 虽然现有研究已经对这些类别中的个别方法进行了调查,但文献实验并没有提供一致的比较。本文件通过系统评估不同类似性和零点化方法来弥补这一差距,用于对隐性分类的文本分类。不同的先进方法以四个文本分类数据集为基准,包括医疗领域的新数据集为基准。此外,还提出了新的SimCSE和SBERT基准,因为现有工作使用的其他基线产生薄弱的分类结果,而且很容易完成。最后,介绍了基于新颖类似性的LBlLBL2 TransferectVec 方法,这在未受监督的文本分类中比以往的状态和零点化方法要好得多。我们的实验表明,即使基于类似性的SIMSE和SERT的更简单化方法也大大超越了在最不受监督的文本分类中采用更简单化的版本。

0

相关内容

文本分类

文本分类（Text Classification）任务是根据给定文档的内容或主题，自动分配预先定义的类别标签。

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

肿瘤源性清道夫受体阳性微颗粒（ScR+MPs）介导树突状细胞耐受：甲状腺癌免疫逃逸新机制

国家自然科学基金

0+阅读 · 2015年12月31日

甘露糖受体介导复合纳米微粒用于肿瘤相关巨噬细胞显像与治疗的研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

生物肽标载磁性药物微粒对脑胶质瘤靶向治疗的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时频二维训练信息的高谱效多天线TFT-OFDM技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

穿孔金属/反铁磁多层膜的远红外(THz)光学性质

国家自然科学基金

0+阅读 · 2012年12月31日

间充质干细胞克隆清除诱导移植耐受新机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型叶酸受体介导的靶向有序超分子膜层层自组装纳米超声造影微泡构建的研究

国家自然科学基金

0+阅读 · 2011年12月31日

超声介导靶向微泡携带Foxp3 siRNA特异性抑制Treg细胞增强抗肿瘤免疫研究

国家自然科学基金

0+阅读 · 2011年12月31日

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

Arxiv

0+阅读 · 2023年2月1日

Masked Contrastive Learning for Anomaly Detection

Arxiv

0+阅读 · 2023年1月30日

Generalized sparse Bayesian learning and application to image reconstruction

Arxiv

0+阅读 · 2023年1月27日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Weakly Supervised One-Shot Detection with Attention Siamese Networks

Arxiv

14+阅读 · 2018年1月12日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

相关论文

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

Arxiv

0+阅读 · 2023年2月1日

Masked Contrastive Learning for Anomaly Detection

Arxiv

0+阅读 · 2023年1月30日

Generalized sparse Bayesian learning and application to image reconstruction

Arxiv

0+阅读 · 2023年1月27日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Weakly Supervised One-Shot Detection with Attention Siamese Networks

Arxiv

14+阅读 · 2018年1月12日

相关基金

肿瘤源性清道夫受体阳性微颗粒（ScR+MPs）介导树突状细胞耐受：甲状腺癌免疫逃逸新机制

国家自然科学基金

0+阅读 · 2015年12月31日

甘露糖受体介导复合纳米微粒用于肿瘤相关巨噬细胞显像与治疗的研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

生物肽标载磁性药物微粒对脑胶质瘤靶向治疗的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时频二维训练信息的高谱效多天线TFT-OFDM技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型免疫负调控分子TIPE2调控CD4+T细胞的功能及在HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

穿孔金属/反铁磁多层膜的远红外(THz)光学性质

国家自然科学基金

0+阅读 · 2012年12月31日

间充质干细胞克隆清除诱导移植耐受新机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型叶酸受体介导的靶向有序超分子膜层层自组装纳米超声造影微泡构建的研究

国家自然科学基金

0+阅读 · 2011年12月31日

超声介导靶向微泡携带Foxp3 siRNA特异性抑制Treg细胞增强抗肿瘤免疫研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员