In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.
翻译:近些年来,垃圾邮件现在试图通过引入混合垃圾邮件,将图像和文本部分混合在一起,来掩盖其意图。 与只包含文本或图像的电子邮件相比,通过引入混合垃圾邮件,发现与图像或文本部分相结合的混合垃圾邮件更具挑战性。本研究的动机是设计一种有效的过滤方法,过滤混合垃圾邮件的混合垃圾邮件,以避免传统基于文本的垃圾邮件或图像粒子的过滤器无法检测到混合垃圾邮件。根据我们的知识,已经开展了一些研究,目的是检测混合垃圾邮件的电子邮件。通常,光学字符识别技术(OCR)被用来通过将图像转换成文本来消除垃圾邮件的图像部分。然而,研究的动机是设计一种有效的方法,过滤混合垃圾电子邮件电子邮件电子邮件电子邮件邮件,避免传统文本或图像垃圾邮件过滤器无法检测到混合邮件文件的超大数量。在转型过程中,OCRCR技术并不总是可靠。 为解决此类问题,我们提议在內部、內部、內部、內地、內、內地、內、內、內、內、內、內、內、內、內、內、內、內、內、後根根、後、後、後、後、內、內、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、後、