With the increasing amount of reliance on digital data and computer networks by corporations and the public in general, the occurrence of cyber attacks has become a great threat to the normal functioning of our society. Intrusion detection systems seek to address this threat by preemptively detecting attacks in real time while attempting to block them or minimizing their damage. These systems can function in many ways being some of them based on artificial intelligence methods. Datasets containing both normal network traffic and cyber attacks are used for training these algorithms so that they can learn the underlying patterns of network-based data. The CIDDS-001 is one of the most used datasets for network-based intrusion detection research. Regarding this dataset, in the majority of works published so far, the Class label was used for training machine learning algorithms. However, there is another label in the CIDDS-001, AttackType, that seems very promising for this purpose and remains considerably unexplored. This work seeks to make a comparison between two machine learning models, K-Nearest Neighbours and Random Forest, which were trained with both these labels in order to ascertain whether AttackType can produce reliable results in comparison with the Class label.
翻译:随着公司和一般公众日益依赖数字数据和计算机网络,网络攻击的发生已成为对我国社会正常运行的巨大威胁,入侵探测系统力求通过实时先发制人地探测攻击来应对这一威胁,同时试图阻止或尽量减少其破坏。这些系统在许多方面可以发挥功能,其中一些是人工智能方法。包含正常网络交通和网络攻击的数据集被用于培训这些算法,以便他们能够了解网络数据的基本模式。CIDDS-001是网络入侵探测研究中最常用的数据集之一。关于这一数据集,在迄今为止发表的大多数著作中,该类标签被用于培训机器学习算法。然而,CIDDS-001、AttackType中还有另一个标签,对于这个目的似乎很有希望,而且仍然相当没有被探索。这项工作试图比较两个机器学习模型,即K-Nearest Briders和Rand Forest,这两个模型都经过了这两个标签的培训,以便确定Battype能否与类标签进行可靠的比较。