This paper investigates fake news detection as a downstream evaluation of Transformer representations, benchmarking encoder-only and decoder-only pre-trained models (BERT, GPT-2, Transformer-XL) as frozen embedders paired with lightweight classifiers. Through controlled preprocessing comparing pooling versus padding and neural versus linear heads, results demonstrate that contextual self-attention encodings consistently transfer effectively. BERT embeddings combined with logistic regression outperform neural baselines on LIAR dataset splits, while analyses of sequence length and aggregation reveal robustness to truncation and advantages from simple max or average pooling. This work positions attention-based token encoders as robust, architecture-centric foundations for veracity tasks, isolating Transformer contributions from classifier complexity.
翻译:本文以假新闻检测为下游任务,评估Transformer表征的性能,对仅编码器和仅解码器的预训练模型(BERT、GPT-2、Transformer-XL)作为冻结嵌入器与轻量级分类器配对进行基准测试。通过对比池化与填充、神经网络与线性分类头的受控预处理,结果表明上下文自注意力编码能持续有效地迁移。在LIAR数据集划分中,BERT嵌入结合逻辑回归优于神经网络基线,而对序列长度与聚合方式的分析揭示了模型对截断的鲁棒性以及简单最大池化或平均池化的优势。本研究将基于注意力的词元编码器定位为真实性任务的稳健、以架构为中心的基石,从而将Transformer的贡献与分类器复杂度分离。