文字分类中的 Tok 分类和序列分类之间的比较研究 (Comparison Study Between Token Classification and Sequence Classification In Text Classification)

Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can be applied to multiple NLP task such as classification, summarization, generation and etc as an out of box model. Among all the of the classical approaches used in NLP, the masked language modeling is the most used. In general, the only requirement to build a language model is presence of the large corpus of textual data. Text classification engines uses a variety of models from classical and state of art transformer models to classify texts for in order to save costs. Sequence Classifiers are mostly used in the domain of text classification. However Token classifiers also are viable candidate models as well. Sequence Classifiers and Token Classifier both tend to improve the classification predictions due to the capturing the context information differently. This work aims to compare the performance of Sequence Classifier and Token Classifiers and evaluate each model on the same set of data. In this work, we are using a pre-trained model as the base model and Token Classifier and Sequence Classier heads results of these two scoring paradigms with be compared..

翻译：在自然语言处理任务中应用了不受监督的机器学习技术,这些技术超过了诸如GLUE等基准,并取得了巨大成功。建立语言模型方法在一种语言中取得了良好结果,并且可以作为箱外模型应用于多种NLP任务,例如分类、总和、生成等。在国家语言方案使用的所有古典方法中,最常用的是隐含语言模型。一般而言,建立语言模型的唯一要求是存在大量文本数据。文本分类引擎使用古典和艺术状态变异器模型的各种模型对文本进行分类,以节省成本。在这项工作中,序列分类器大多用于文本分类领域。然而,图肯分类器也是可行的候选模型。序列分类器和托肯分类器都倾向于根据获取背景信息的不同改进分类预测。这项工作的目的是比较序列分级器和托肯分级器的性能,并评价同一数据集的每一种模型。在进行这项工作中,我们正在使用一个经过事先培训的模型,作为基础模型和阶梯级模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日