Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can be applied to multiple NLP task such as classification, summarization, generation and etc as an out of box model. Among all the of the classical approaches used in NLP, the masked language modeling is the most used. In general, the only requirement to build a language model is presence of the large corpus of textual data. Text classification engines uses a variety of models from classical and state of art transformer models to classify texts for in order to save costs. Sequence Classifiers are mostly used in the domain of text classification. However Token classifiers also are viable candidate models as well. Sequence Classifiers and Token Classifier both tend to improve the classification predictions due to the capturing the context information differently. This work aims to compare the performance of Sequence Classifier and Token Classifiers and evaluate each model on the same set of data. In this work, we are using a pre-trained model as the base model and Token Classifier and Sequence Classier heads results of these two scoring paradigms with be compared..
翻译:在自然语言处理任务中应用了不受监督的机器学习技术,这些技术超过了诸如GLUE等基准,并取得了巨大成功。 建立语言模型方法在一种语言中取得了良好结果,并且可以作为箱外模型应用于多种NLP任务,例如分类、总和、生成等。 在国家语言方案使用的所有古典方法中,最常用的是隐含语言模型。一般而言,建立语言模型的唯一要求是存在大量文本数据。文本分类引擎使用古典和艺术状态变异器模型的各种模型对文本进行分类,以节省成本。在这项工作中,序列分类器大多用于文本分类领域。然而,图肯分类器也是可行的候选模型。序列分类器和托肯分类器都倾向于根据获取背景信息的不同改进分类预测。这项工作的目的是比较序列分级器和托肯分级器的性能,并评价同一数据集的每一种模型。在进行这项工作中,我们正在使用一个经过事先培训的模型,作为基础模型和阶梯级模型。