The standard ML methodology assumes that the test samples are derived from a set of pre-observed classes used in the training phase. Where the model extracts and learns useful patterns to detect new data samples belonging to the same data classes. However, in certain applications such as Network Intrusion Detection Systems, it is challenging to obtain data samples for all attack classes that the model will most likely observe in production. ML-based NIDSs face new attack traffic known as zero-day attacks, that are not used in the training of the learning models due to their non-existence at the time. In this paper, a zero-shot learning methodology has been proposed to evaluate the ML model performance in the detection of zero-day attack scenarios. In the attribute learning stage, the ML models map the network data features to distinguish semantic attributes from known attack (seen) classes. In the inference stage, the models are evaluated in the detection of zero-day attack (unseen) classes by constructing the relationships between known attacks and zero-day attacks. A new metric is defined as Zero-day Detection Rate, which measures the effectiveness of the learning model in the inference stage. The results demonstrate that while the majority of the attack classes do not represent significant risks to organisations adopting an ML-based NIDS in a zero-day attack scenario. However, for certain attack groups identified in this paper, such systems are not effective in applying the learnt attributes of attack behaviour to detect them as malicious. Further Analysis was conducted using the Wasserstein Distance technique to measure how different such attacks are from other attack types used in the training of the ML model. The results demonstrate that sophisticated attacks with a low zero-day detection rate have a significantly distinct feature distribution compared to the other attack classes.
翻译:标准 ML 方法假定测试样本来自培训阶段使用的一套预观察类。模型提取并学习了用于检测属于同一数据类的新数据样本的有用模式。然而,在诸如网络入侵探测系统等某些应用中,获取该模型最有可能在生产过程中观察到的所有攻击类的数据样本具有挑战性。基于ML的NIDS面临被称为零天袭击的新袭击流量,由于当时不存在,这些袭击没有用于培训学习模型。在本论文中,提出了一种零点点分析方法,以评价ML 模型在检测零日袭击情景时的准确性能。在属性学习阶段,ML 模型绘制网络数据特征,以区分该模型最有可能观察到的所有攻击类袭击。在推断阶段,这些模型通过构建已知袭击与零日袭击袭击袭击之间的关系来评估零天袭击(不见)等。新的指标被定义为零日测试率,在检测零天袭击情景时,没有提出零天袭击模式的准确性评估率,在测试阶段中,在学习某类袭击中,使用该模型展示了某种明显的行为风险。