Holmes:高效和轻量量级语义基础异常电子邮件探测器 (Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector)

Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.

翻译：电子邮件威胁是企业安全的一个严重问题,它包括各种恶意情况,如钓鱼、欺诈、勒索和广告错误等。传统的反垃圾邮件网关通常需要保持灰色列表,以过滤邮件主题和内容中基于可疑词汇的意想不到的电子邮件。然而,基于签名的方法无法有效地发现利用诸如COVID-19和美国选举等当前各种热题的新颖和未知的可疑电子邮件。为了解决这个问题,我们在本文件中介绍福尔摩斯,这是一个高效和轻量级的基于语义的引擎,用来侦测异常邮件。福尔摩斯可以通过文字嵌入,然后通过新颖的检测将每件事件记录转换成一个句子,在其中提取有趣的项目。根据我们的观察,我们声称,在企业环境中,发送者和接收者之间存在稳定的关系,但可疑的电子邮件通常来自不寻常的来源,可以通过稀有的选择来检测。我们评估福尔摩斯在现实世界企业环境中的表现,每天发送和接收大约5 000封电子邮件。作为结果,福尔摩斯每天能够达到一个高度的可疑程度。