This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.
翻译:本文通过比较BERT类、Sentence Transformer及Seq2Seq等三个主要模型家族的显著模型,以及Naïve Bayes和LightGBM等已建立的垃圾邮件检测机器学习技术作为基线方法,考察了大型语言模型在邮件垃圾检测中的有效性。我们利用四个公共数据集评估这些模型的性能,在使用不同数量的训练样本(完整训练集和少量样本)的情况下评估其性能。研究结果表明,在大多数情况下,大型语言模型超过了流行的基线技术的表现,尤其是在少量样本的情况下。这种适应性使得大型语言模型非常适合垃圾邮件检测任务,其中标记样本数量有限,而模型需要频繁更新。此外,我们介绍了Spam-T5,这是一个专门适配并微调用于检测电子邮件垃圾的Flan-T5模型。我们的研究结果显示,在大多数情况下,Spam-T5超越了基线模型和其他大型语言模型,特别是当有限数量的训练样本可用时。我们公开发布了代码,网址为https://github.com/jpmorganchase/emailspamdetection。