This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text. We explore improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. We test the effectiveness of including the aforementioned features in accommodating the challenges of a noisy data set centred on a specific subject outside the remit of the pre-training data. We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.
翻译:本文介绍我们提交 " 噪音用户生成文本讲习班 " 任务2的呈件,我们探索如何通过使用实体一级信息和手工制作特征的混合实施,改进经过培训的基于变压器的文本分类精细调整的语言模型的性能,我们测试上述特征是否有效,以应对以培训前数据范围以外某个特定主题为核心的吵闹数据集的挑战,我们指出,增加额外功能可以改进分类结果,并在最高绩效团队的两点内实现得分。