The popularity of graph neural networks has triggered a resurgence of graph-based methods for single-label and multi-label text classification. However, it is unclear whether these graph-based methods are beneficial compared to standard machine learning methods and modern pretrained language models. We compare a rich selection of bag-of-words, sequence-based, graph-based, and hierarchical methods for text classification. We aggregate results from the literature over 5 single-label and 7 multi-label datasets and run our own experiments. Our findings unambiguously demonstrate that for single-label and multi-label classification tasks, the graph-based methods fail to outperform fine-tuned language models and sometimes even perform worse than standard machine learning methods like multilayer perceptron (MLP) on a bag-of-words. This questions the enormous amount of effort put into the development of new graph-based methods in the last years and the promises they make for text classification. Given our extensive experiments, we confirm that pretrained language models remain state-of-the-art in text classification despite all recent specialized advances. We argue that future work in text classification should thoroughly test against strong baselines like MLPs to properly assess the true scientific progress. The source code is available: https://github.com/drndr/multilabel-text-clf
翻译:图表神经网络的广度引发了以图形为基础的单一标签和多标签文本分类方法的死灰复燃。然而,这些以图表为基础的方法与标准的机器学习方法和现代预先培训的语言模型相比是否有益尚不清楚。我们比较了大量选择的一袋字、基于序列、基于图表和等级的文本分类方法。我们汇总了5个单一标签和7个多标签数据集的文献结果,并进行了自己的实验。我们的研究结果明确表明,单标签和多标签分类任务,基于图表的方法未能超过经精细调整的语言模型,有时甚至比标准机器学习方法差一些,如在一袋字上的多层过分(MLP)方法。这质疑过去几年为开发新的图表方法所做的大量努力,以及它们为文本分类所做的承诺。我们进行了广泛的实验,我们证实,尽管最近取得了各种专门的进展,但经过预先培训的语言模型仍然处于文本分类中状态。我们说,今后在文本分类方面开展的工作应该比标准的基线更彻底地测试,例如MLP/MLP-Malf-moltalex。我们正确地评估了真正的科学源。</s>