The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam emails has often rendered most of these systems ineffective. While several spam detection models have been reported in literature, the reported performance on an out of sample test data shows the room for more improvement. Presented in this research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost) which to the best of our knowledge has received little attention spam email detection problems. Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics. A thorough analysis of the model results in comparison to the results of earlier works is also presented.
翻译:电子邮件为电子设备用户提供的信息交流的普及程度、成本效益和方便程度一直受到越来越多的未经索取或垃圾邮件的困扰。由于需要保护电子邮件用户免受这种日益增长的威胁,对垃圾邮件过滤/检测系统的研究在过去十年中越来越活跃。然而,垃圾邮件电子邮件的适应性往往使大多数这些系统无效。虽然文献中报告了几个垃圾邮件检测模型,但所报道的抽样测试数据外的性能表明有需要进一步改进的空间。本研究中展示的是一种基于极端快速启动(XGBoost)的改进的垃圾邮件检测模型,我们最了解的这种模型很少受到垃圾邮件检测问题的注意。实验结果显示,拟议的模型在广泛的评价指标方面优于早期的方法。还介绍了对模型结果的透彻分析,与早期工作的结果进行比较。