Text Generation Models (TGMs) succeed in creating text that matches human language style reasonably well. Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM. In this paper, we describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to generate text (multiclass task). We achieved 1st place on the binary classification task with an accuracy score of 0.82995 on the private test set and 4th place on the multiclass classification task with an accuracy score of 0.62856 on the private test set. We proposed an ensemble method of different pre-trained models based on the attention mechanism.
翻译:文本生成模型(TGMs)成功地创造了与人文风格相当符合的文本。能够区分TGM产生的文本和人文文字的探测器在防止滥用TGM方面发挥着重要作用。本文描述了我们为两项DIALOG-22 RuATD任务而准备的管道:检测生成的文本(双重任务)和将模型用于生成文本的分类(多级任务)。我们在二进制分类任务中取得了第一位,在私人测试集中精确分为0.82995,在多级分类任务中精确分为0.62856,在私人测试集中精确分为0.62856。我们提出了基于关注机制的不同预培训模式的混合方法。