Short text classification is a crucial and challenging aspect of Natural Language Processing. For this reason, there are numerous highly specialized short text classifiers. However, in recent short text research, State of the Art (SOTA) methods for traditional text classification, particularly the pure use of Transformers, have been unexploited. In this work, we examine the performance of a variety of short text classifiers as well as the top performing traditional text classifier. We further investigate the effects on two new real-world short text datasets in an effort to address the issue of becoming overly dependent on benchmark datasets with a limited number of characteristics. Our experiments unambiguously demonstrate that Transformers achieve SOTA accuracy on short text classification tasks, raising the question of whether specialized short text techniques are necessary.
翻译:短文本分类是自然语言处理的一个重要和具有挑战性的方面。为此,有许多高度专业化的短文本分类人员。然而,在最近的短文本研究中,传统文本分类,特别是纯使用变换器的艺术状态方法尚未开发。在这项工作中,我们审查了各种短文本分类人员以及表现最好的传统文本分类人员的业绩。我们进一步调查了对两个新的真实世界短文本数据集的影响,以努力解决过于依赖具有有限特点的基准数据集的问题。我们的实验明确表明,变换者在短文本分类任务上实现了SOTA的准确性,提出了是否需要专门的短文本技术的问题。