利用海量数据和深学习减少假阳性的异常检测优化利用海量数据和深度学习减少假阳性 (Anomaly detection optimization using big data and deep learning to reduce false-positive)

Anomaly-based Intrusion Detection System (IDS) has been a hot research topic because of its ability to detect new threats rather than only memorized signatures threats of signature-based IDS. Especially after the availability of advanced technologies that increase the number of hacking tools and increase the risk impact of an attack. The problem of any anomaly-based model is its high false-positive rate. The high false-positive rate is the reason why anomaly IDS is not commonly applied in practice. Because anomaly-based models classify an unseen pattern as a threat where it may be normal but not included in the training dataset. This type of problem is called overfitting where the model is not able to generalize. Optimizing Anomaly-based models by having a big training dataset that includes all possible normal cases may be an optimal solution but could not be applied in practice. Although we can increase the number of training samples to include much more normal cases, still we need a model that has more ability to generalize. In this research paper, we propose applying deep model instead of traditional models because it has more ability to generalize. Thus, we will obtain less false-positive by using big data and deep model. We made a comparison between machine learning and deep learning algorithms in the optimization of anomaly-based IDS by decreasing the false-positive rate. We did an experiment on the NSL-KDD benchmark and compared our results with one of the best used classifiers in traditional learning in IDS optimization. The experiment shows 10% lower false-positive by using deep learning instead of traditional learning.

翻译：以异常为基础的入侵探测系统(IDS)是一个热门的研究课题,因为它能够发现新的威胁,而不仅仅是以签名为基础的IDS的记忆签名威胁。特别是在具备先进技术,这些技术增加了黑客工具的数量,增加了袭击的危险性影响之后。任何以异常为基础的模型的问题都是其高的假阳性率。高假阳性率是异常IDS在实践中普遍应用的原因。因为以异常为基础的模型将一种不为人知的模式归类为一种威胁,而这种威胁可能是正常的,但并没有包括在培训数据集中。此类问题在模型无法普遍化的地方要求过于适应。优化基于非异常的模型,将所有可能的正常案例都包括在内,从而优化基于非典型的模型可能是一种最佳解决办法,但实际上无法应用。尽管我们可以增加培训样本的数量,以更普通得多的方式应用,但我们仍然需要一种模型,更能概括化。在这个研究文件中,我们建议采用深度模型,而不是传统的模型,因为它更能概括化。因此,在模型中,我们用一个更低的深度的、更精确的、更精确的、更精确的、更精确的SDIS的模型,我们用了在深度的模型中学习了一个错误的模型上,我们用了一个更低的模型学习了。