In many real-world network environments, several types of cyberattacks occur at very low rates compared to benign traffic, making them difficult for intrusion detection systems (IDS) to detect reliably. This imbalance causes traditional evaluation metrics, such as accuracy, to often overstate model performance in these conditions, masking failures on minority attack classes that are most important in practice. In this paper, we evaluate a set of base and meta classifiers on low-traffic attacks in the CSE-CIC-IDS2018 dataset and compare their reliability in terms of accuracy and Matthews Correlation Coefficient (MCC). The results show that accuracy consistently inflates performance, while MCC provides a more accurate assessment of a classifier's performance across both majority and minority classes. Meta-classification methods, such as LogitBoost and AdaBoost, demonstrate more effective minority class detection when measured by MCC, revealing trends that accuracy fails to capture. These findings establish the need for imbalance-aware evaluation and make MCC a more trustworthy metric for IDS research involving low-traffic cyberattacks.
翻译:在许多现实网络环境中,与良性流量相比,某些类型的网络攻击发生率极低,导致入侵检测系统难以可靠识别。这种不平衡性使得传统评估指标(如准确率)常常高估模型在此类条件下的性能,掩盖了对实践中至关重要的少数攻击类别的检测失败。本文基于CSE-CIC-IDS2018数据集中的低流量攻击场景,评估了一系列基分类器与元分类器,并从准确率和马修斯相关系数两个维度比较其可靠性。结果表明:准确率持续夸大模型性能,而MCC能更精确评估分类器在多数类与少数类上的综合表现。通过MCC度量发现,LogitBoost和AdaBoost等元分类方法在少数类检测方面表现更优,这些趋势是准确率指标无法揭示的。本研究证实了采用不平衡感知评估的必要性,并确立了MCC作为涉及低流量网络攻击的入侵检测研究中更可信的评估指标。